Study Programming/Google AI기초공부(with CloudSkillBoost)

[RAG 1/4] 1. Multimodality with Gemini (Gemini의 멀티모달리티)

네모메모 2025. 8. 30. 13:48

Inspect Rich Documents with Gemini Multimodality and Multimodal RAG

과정 중 첫번째

1. Multimodality with Gemini

(Gemini의 멀티모달리티)

멀티모달이란?

여러 종류의 데이터를 동시에 이해하고 처리하는 능력을 의미합니다.

'멀티(multi)'는 '다중'을, '모달(modal)'은 데이터의 종류(양식)를 뜻하며, AI가 인간처럼 텍스트, 이미지, 음성, 영상 등 다양한 정보를 함께 활용해 더 종합적으로 상황을 파악하고 소통하는 방식입니다.

멀티모달 AI란 무엇인가요?

여러 모달리티 또는 데이터 유형의 정보를 처리하고 통합할 수 있는 머신 러닝 모델을 말합니다.

이러한 모달리티에는 텍스트, 이미지, 오디오, 비디오 및 기타 형태의 감각적 인풋이 포함될 수 있습니다.

cf) 일반적으로 단일 유형의 데이터를 처리하도록 설계된 기존 AI 모델과 달리 멀티모달 AI는 다양한 형태의 데이터 인풋을 결합 및 분석하여 보다 포괄적인 이해를 달성하고 보다 강력한 아웃풋을 생성합니다.

개요

Google에서 개발한 멀티모달 생성형 AI 모델 제품군인 Gemini, 이 실습에선 Gemini API를 사용하여 Gemini Flash가 텍스트, 이미지, 동영상을 이해하고 이를 기반으로 대답을 생성하는 방법을 살펴봅니다.

[Gemini의 멀티모달 기능을 사용하여 가능한 작업]

이미지 분석: 객체를 감지하고, 사용자 인터페이스를 파악하고, 다이어그램을 해석하고, 시각적 유사성과 차이점을 비교합니다.
동영상 처리: 설명을 생성하고, 태그와 하이라이트를 추출하고, 동영상 콘텐츠에 관한 질문에 답변합니다.

이번 실습에선 Vertex AI의 Gemini API를 사용하는 실습 작업을 통해 이러한 기능을 실험해 봅니다.

목표

[이 실습에서 학습할 내용]

Vertex AI의 Gemini API와 상호작용합니다.
Gemini Flash 모델을 사용하여 이미지와 동영상을 분석합니다.
Gemini에 텍스트, 이미지, 동영상 프롬프트를 제공하여 유용한 대답을 생성합니다.
Gemini의 멀티모달 기능을 실용적으로 적용하는 방법을 살펴봅니다.

+) Gemini Flash는 멀티모달 프롬프트를 지원하는 멀티모달 모델입니다. 프롬프트 요청에 텍스트, 이미지, 동영상을 포함하고 텍스트 또는 코드 응답을 얻을 수 있습니다.

기본 요건

이 실습을 시작하기 전에 다음 개념을 숙지해야 합니다.

기본 Python 프로그래밍
일반적인 API 개념
Vertex AI Workbench의 Jupyter 노트북에서 Python 코드를 실행하는 방법

실습 셋팅과정들인

'실습 시작 버튼을 클릭하기 전에',

'실습을 시작하고 Google Cloud 콘솔에 로그인하는 방법'

'작업 1. Vertex AI Workbench에서 노트북 열기' = 'Task 1. Open the notebook in Vertex AI Workbench'

'작업 2. 노트북 설정하기', = 'Task 2. Set up the notebook'

👆 들은 모두 여기에 설명되어 있습니다.

실습시작

1. Getting Started

1-1) Install Google Gen AI SDK for Python

%pip install --upgrade --quiet google-genai gitingest

1-2) Authenticate your notebook environment (Colab only)

ㄴ> 코드랩 환경이 아니므로 건너뛰었습니다.

import sys

# Additional authentication is required for Google Colab

if "google.colab" in sys.modules:

# Authenticate user to Google Cloud

from google.colab import auth

auth.authenticate_user()

2. 'Task 3. Use the Gemini Flash model'

이 작업에서는 지정된 노트북 셀을 실행하여 Gemini Flash 모델을 사용하도록 설정해야 합니다.

ㄴ> Gemini Flash는 멀티모달 프롬프트를 지원하는 멀티모달 모델입니다.

ㄴ> 프롬프트 요청에 텍스트, 이미지, 동영상을 포함하고 텍스트 또는 코드 응답을 얻을 수 있습니다.

2-1) Set Google Cloud project information and create client

(Google Cloud 프로젝트 정보 설정 및 클라이언트 생성)

- 'PROJECT_ID'와 'LOCATION' 값을 입력

- Vertex AI 사용을 시작하려면 기존 Google Cloud 프로젝트가 있어야 하고 Vertex AI API를 활성화해야 합니다.

- Learn more : setting up a project and a development environment

import os

from google import genai

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

2-2) Import libraries

from IPython.display import Audio, Image, Markdown, Video, display
from gitingest import ingest
from google.genai.types import CreateCachedContentConfig, GenerateContentConfig, Part
import nest_asyncio

nest_asyncio.apply()

2-3) Gemini 2.0 Flash 모델 로드하기

- Learn more : Gemini models on Vertex AI

MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

3. Individual Modalities

3-1) Text 이해하기

- Gemini는 텍스트 질문을 분석하고 다음 프롬프트에서도 해당 맥락을 유지할 수 있습니다.
question = "5월 중순 캘리포니아주 마운틴 뷰의 평균 날씨는 어떻습니까?"
prompt = """
날씨를 고려하여 몇 가지 복장을 추천해 주세요.

낮과 저녁에 대한 예시를 제시해 주세요.
"""

question = "What is the average weather in Mountain View, CA in the middle of May?"
prompt = """
Considering the weather, please provide some outfit suggestions.

Give examples for the daytime and the evening.
"""

contents = [question, prompt]
response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

3-2) Document Summarization (문서 요약)

- Gemini를 사용하여 PDF 문서를 처리하고, 콘텐츠를 분석하고, 정보를 유지하고, 문서 관련 질의에 대한 답변을 제공할 수 있습니다.

ex1) 토큰 처리 갯수 묻기

- 예시 PDF 문서 : Gemini 2.0 논문(https://arxiv.org/pdf/2403.05530.pdf)

- 예시 prompt = "모델은 몇 개의 토큰을 처리할 수 있습니까?"

pdf_file_uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
pdf_file = Part.from_uri(file_uri=pdf_file_uri, mime_type="application/pdf")

prompt = "How many tokens can the model process?"

contents = [pdf_file, prompt]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행결과 : Gemini 1.5 Pro can process up to 10M tokens.

ex2) 문서 요약 하기

prompt = """
전문 문서 요약 전문가입니다.
주어진 문서를 요약해 주십시오.
"""

prompt = """
  You are a professional document summarization specialist.
  Please summarize the given document.
"""

contents = [pdf_file, prompt]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> contents 타입 : contents[]object (Content)
ㄴ> 실행결과 : Here's a summary of the provided document:

This document is a technical report introducing Gemini 1.5 Pro, the latest model in the Gemini family by Google DeepMind. It's a highly compute-efficient multimodal mixture-of-experts model, which means it can process and understand various types of information like text, audio, and video. It's able to recall and reason over information from millions of tokens of context, which is a significant leap over existing models. This means it can handle large amounts of data, like long documents, hours of video, and audio files. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks, improves state-of-the-art performance in long-document QA, long-video QA, and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's performance across a broad set of benchmarks.

In essence, Gemini 1.5 Pro represents a significant advancement in language model capabilities, especially in handling very long and complex contexts.

3-3) Image understanding across multiple images

(여러 이미지를 통한 이미지 이해)

- Gemini의 한 가지 기능은 여러 이미지를 통한 추론입니다.

- ex) 타원형 얼굴형에 어떤 안경이 더 적합한지 추론하는 예시

prompt = """
저는 타원형 얼굴입니다. 제 얼굴형을 고려할 때 어떤 안경이 더 적합할까요?

이러한 결정을 내리게 된 배경을 설명해 주세요.
제 얼굴형을 바탕으로 추천 안경을 제시하고, 각 안경에 대한 설명을 덧붙여 주세요.
"""

image_glasses1_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/glasses1.jpg"
image_glasses2_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/glasses2.jpg"

display(Image(image_glasses1_url, width=150))
display(Image(image_glasses2_url, width=150))

prompt = """
I have an oval face. Given my face shape, which glasses would be more suitable?

Explain how you reached this decision.
Provide your recommendation based on my face shape, and please give an explanation for each.
"""

contents = [
    prompt,
    Part.from_uri(file_uri=image_glasses1_url, mime_type="image/jpeg"),
    Part.from_uri(file_uri=image_glasses2_url, mime_type="image/jpeg"),
]
response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> contents에 이미지 전달 : Part.from_uri(file_uri="${이미지 경로}", mime_type="${이미지타입}")

ㄴ> 실행결과 :

3-4) Video description (동영상 설명 생성)

- Gemini는 동영상 전반에서 태그를 추출하고, 동영상 콘텐츠 이외의 추가 정보를 검색할 수도 있다.

- ex1) 여러 동영상에서 태그를 추출하고 추가 정보를 검색

prompt = """
이 동영상에는 무엇이 표시되나요?
어디서 봐야 하나요?
이런 풍경을 가장 잘 보여주는 세계 5대 장소는 어디인가요?
이 동영상에 가장 적합한 태그 10개를 입력하세요.
"""

video_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/mediterraneansea.mp4"
display(Video(video_url, width=350))

prompt = """
What is shown in this video?
Where should I go to see it?
What are the top 5 places in the world that look like this?
Provide the 10 best tags for this video?
"""

video = Part.from_uri(
    file_uri=video_url,
    mime_type="video/mp4",
)
contents = [prompt, video]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행결과 :

Okay, here's the information based on the image you provided:

What is shown in this video?

The video shows an aerial view of the harbor in Antalya, Turkey. There's a lighthouse at the end of a stone pier, a marina filled with boats, rocky cliffs with buildings on top, and the sea.

Where should I go to see it?

You should go to Antalya, Turkey.

What are the top 5 places in the world that look like this?

It's tough to find exactly similar places. But here are 5 locations that share some visual similarities, such as coastal cliffs, harbors with historical elements, and Mediterranean vibes:

Dubrovnik, Croatia: The Old Town of Dubrovnik has fortified walls along the coast, similar cliffs, and a beautiful harbor.
Valletta, Malta: Valletta has a historic harbor and impressive fortifications.
Cinque Terre, Italy: The villages of Cinque Terre are perched on rocky cliffs overlooking the sea.
Santorini, Greece: The white-washed buildings on volcanic cliffs overlooking the Aegean Sea have a similar feel, though the architecture is distinct.
Cascais, Portugal: Offers a mix of historic architecture, rocky coastline, and a lively marina.

Provide the 10 best tags for this video?

Antalya
Turkey
Harbor
Marina
Lighthouse
Mediterranean
Coast
Cliffs
Travel
Drone footage

- ex2) Gemini를 사용하여 비디오 콘텐츠 외의 추가 정보를 검색

prompt = """
이 열차 노선은 무엇인가요?
어디로 가나요?
역/정류장은 어디인가요?
어떤 강을 건너나요?
"""

video_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/ottawatrain3.mp4"
display(Video(video_url, width=350))

prompt = """
Which train line is this?
Where does it go?
What are the stations/stops?
Which river is being crossed?
"""

video = Part.from_uri(
    file_uri=video_url,
    mime_type="video/mp4",
)
contents = [prompt, video]

response = client.models.generate_content(
    model=MODEL_ID, contents=contents, config=GenerateContentConfig(temperature=0)
)
display(Markdown(response.text))

ㄴ> 실행결과 :

Here's the information about the train line based on the image:

Train Line: O-Train Trillium Line
Location: Ottawa, Ontario, Canada
River: Rideau River
Stations/Stops: Bayview, Carling, Carleton, Confederation, Greenboro, Hunt Club, Leitrim, Mooney's Bay, Ottawa Airport, South Keys

3-5) Audio (오디오 분석, 대사추출)

- Gemini는 긴 컨텍스트 이해를 위해 오디오를 직접 처리할 수 있다.

- ex) 예시로 사용할 오디오

audio_url = (
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/pixel.mp3"
)
display(Audio(audio_url))

- ex1) 긴 컨텍스트 이해를 위한 오디오를 분석

prompt = """
오디오에 대한 간략한 요약과 제목을 입력해 주세요.
장 제목은 간결하고 짧게 작성해야 하며, 장 요약은 제공할 필요가 없습니다.
각 장 제목은 번호가 매겨진 목록으로 작성해 주세요.
오디오에 포함되지 않은 정보는 조작하거나 장황하게 작성하지 마세요.
"""

prompt = """
  Please provide a short summary and title for the audio.
  Provide chapter titles, be concise and short, no need to provide chapter summaries.
  Provide each of the chapter titles in a numbered list.
  Do not make up any information that is not part of the audio and do not be verbose.
"""

audio_file = Part.from_uri(file_uri=audio_url, mime_type="audio/mpeg")
contents = [audio_file, prompt]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행결과 :

Okay, here are the title and summary you asked for, followed by the chapter list.

Title: Made by Google Podcast: March Feature Drop

Summary: In this episode of the Made by Google Podcast, Aicha Sharif and DeCarlos Love, Pixel product managers, discuss Google's Pixel feature drops, including the new March feature drop. They explain the importance of feature drops in updating and improving devices, discuss the features in the January and March drops, and answer questions from Google Pixel super-fans.

Chapters:

Intro of Guests
Most Transformative Features
Why Feature Drops Are Important
January Feature Drop
Pixel Watch Feature Drop (March)
Pixel Phone Feature Drop (March)
Feature Drop News for other Devices
Pixel Community Question
User Feedback
When Can I Get This?
Favorite Feature Drops
Conclusion

- ex2) 오디오 대사 추출하기 (Transcription)

prompt = """
이 인터뷰를 타임코드, 화자, 캡션 형식으로 작성해 주십시오.
화자 A, 화자 B 등으로 화자를 구분하십시오.
각 정보는 별도의 글머리 기호로 작성해 주십시오.
"""

prompt = """
    Transcribe this interview, in the format of timecode, speaker, caption.
    Use speaker A, speaker B, etc. to identify the speakers.
    Provide each piece of information on a separate bullet point.
"""

audio_file = Part.from_uri(file_uri=audio_url, mime_type="audio/mpeg")
contents = [audio_file, prompt]

response = client.models.generate_content(
    model=MODEL_ID,
    contents=contents,
    config=GenerateContentConfig(max_output_tokens=8192),
)
display(Markdown(response.text))

ㄴ> 실행결과 :

00:00:00 A Your devices are getting better over time.
00:00:04 A And so we think about it across the entire portfolio from phones, to watch, to buds, to tablet.
00:00:14 A We get really excited about how we can tell a joint narrative across everything.
00:00:18 B Welcome to the Made by Google Podcast, where we meet the people who work on the Google products you love.
00:00:24 B Here's your host, Rashid Finch.
00:00:26 C Today, we're talking to Aisha Sharif and DeCarlos Love.
00:00:30 C They're both product managers for various pixel devices and work on something that all the pixel owners love.
00:00:37 C The Pixel Feature Drops.
00:00:39 B This is the Made by Google Podcast.
00:00:42 C Aisha, which feature on your Pixel phone has been most transformative in your own life?
00:00:48 A So many features.
~~길어서 이후 생략~~

3-5) codebase (코드베이스 전반에서 추론)

- ex 0) Online Boutique 저장소를 예시로 사용

https://github.com/GoogleCloudPlatform/microservices-demo

GitHub - GoogleCloudPlatform/microservices-demo: Sample cloud-first application with 10 microservices showcasing Kubernetes, Ist

Sample cloud-first application with 10 microservices showcasing Kubernetes, Istio, and gRPC. - GoogleCloudPlatform/microservices-demo

github.com

ㄴ> Online Boutique는 클라우드 기반 마이크로서비스 데모 애플리케이션으로 이 애플리케이션은 사용자가 상품을 탐색하고, 장바구니에 추가하고, 구매할 수 있는 웹 기반 전자상거래 앱입니다.

이 애플리케이션은 여러 언어로 제공되는 11개의 마이크로서비스로 구성되어 있습니다.

1) 사용할 repo_url 설정

# The GitHub repository URL
repo_url = "https://github.com/GoogleCloudPlatform/microservices-demo"  # @param {type:"string"}

2) 인덱스를 생성하고 코드베이스의 내용을 추출

- 저장소를 복제하고 인덱스를 생성하고 코드/텍스트 파일의 내용을 추출한다.

exclude_patterns = {
    "*.png",
    "*.jpg",
    "*.jpeg",
    "*.gif",
    "*.svg",
    "*.ico",
    "*.webp",
    "*.jar",
    ".git/",
    "*.gitkeep",
}
_, code_index, code_text = ingest(repo_url, exclude_patterns=exclude_patterns)

3) 코드베이스 콘텐츠 캐시 생성
- 코드베이스 프롬프트는 포함된 모든 데이터로 인해 상당히 커질 것입니다.

- Gemini는 컨텍스트 캐싱을 지원합니다.

Gemini 의 컨텍스트 캐싱(Context Caching)이란?

: 반복적으로 사용되는 긴 프롬프트(컨텍스트)를 미리 캐시에 저장해두고, 후속 요청 시에는 이 캐시를 참조하여 처리하는 기능입니다.
ㄴ> 이를 통해 비용을 절감하고 응답 속도를 크게 향상시킬 수 있습니다.
ㄴ> 컨텍스트 캐싱을 사용하면 자주 사용되는 입력 토큰을 전용 캐시에 저장하고 후속 요청에서 참조할 수 있으므로 동일한 토큰 세트를 모델에 반복적으로 전달할 필요가 없습니다.
ㄴ> 후속 요청에서 상당한 양의 초기 컨텍스트를 반복적으로 참조하는 시나리오에 특히 적합
ㄴ> 참고: 컨텍스트 캐싱은 고정된 버전을 가진 안정적인 모델(예: gemini-2.0-flash-001)에서만 사용할 수 있습니다.

- ex)

prompt = f"""
맥락:
- 전체 코드베이스는 아래에 제공됩니다.
- 코드베이스에 있는 모든 파일의 인덱스는 다음과 같습니다.
\n\n{code_index}\n\n.
- 각 파일을 연결합니다. 필요한 모든 코드를 찾을 수 있습니다.
\n\n{code_text}\n\n
"""

prompt = f"""
Context:
- The entire codebase is provided below.
- Here is an index of all of the files in the codebase:
    \n\n{code_index}\n\n.
- Then each of the files is concatenated together. You will find all of the code you need:
    \n\n{code_text}\n\n
"""

cached_content = client.caches.create(
    model="gemini-2.0-flash-001",
    config=CreateCachedContentConfig(
        contents=prompt,
        ttl="3600s",
    ),
)

4) 개발자 시작 가이드 작성

question = """
새로운 개발자가 코드베이스에 적응할 수 있도록 시작 가이드를 제공합니다.
"""

question = """
  Provide a getting started guide to onboard new developers to the codebase.
"""

response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=question,
    config=GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)
display(Markdown(response.text))

ㄴ> 실행결과 :

Okay, here's a consolidated getting started guide for new developers, drawing from the various files and documentation within the repository:

Online Boutique - Getting Started for Developers

This guide will help you get up and running with the Online Boutique codebase.

1. Project Overview

What is Online Boutique? It's a cloud-first, microservices-based e-commerce demo application. It simulates an online store where users can browse, add items to a cart, and purchase.
Purpose: The primary goal of this project is to demonstrate Google Cloud products (GKE, Anthos, Cloud Operations) and related technologies for modernizing enterprise applications.
Microservices Architecture: The application is composed of 11 microservices written in different languages (Go, C#, Node.js, Python, Java), communicating over gRPC. See the README.md for a detailed breakdown of each service.
License: The project uses the Apache 2.0 license.

2. Prerequisites

Tools:
- Docker for Desktop
- kubectl
- skaffold 2.0.2+ (latest version recommended)
- git
- A Google Cloud project with Google Container Registry enabled (if deploying to GKE).
- Minikube (optional, for local Kubernetes) or Kind (optional, for local Kubernetes)
- gsed (GNU sed, for macOS, install with brew install gnu-sed; or symlink sed for Linux)
- helm
Understanding of:
- Microservices architecture
- gRPC
- Docker
- Kubernetes
- (If contributing to specific services) the relevant language (Go, C#, Node.js, Python, Java)

3. Codebase Structure

/: Top-level project directory.
README.md: General project information and quickstart instructions.
cloudbuild.yaml: Configuration for Google Cloud Build.
skaffold.yaml: Configuration for Skaffold (build, deployment). This is a crucial file!
LICENSE: License information.
docs/: Documentation (architecture, development guide, etc.).
helm-chart/: Helm chart for deploying Online Boutique.
istio-manifests/: Istio manifests.
kubernetes-manifests/: Kubernetes manifests (used by Skaffold).
kustomize/: Kustomize configurations for customizing deployments.
protos/: Protocol Buffer definitions (.proto files) for gRPC communication.
release/: Manifests configured with pre-built public images.
src/: Source code for each microservice.
- adservice/, cartservice/, checkoutservice/, currencyservice/, emailservice/, frontend/, loadgenerator/, paymentservice/, productcatalogservice/, recommendationservice/, shippingservice/, shoppingassistantservice/: Directories for each microservice, containing source code, Dockerfile, and build scripts.
terraform/: Terraform scripts for infrastructure provisioning on Google Cloud.
.github/workflows/: GitHub Actions workflow definitions for CI/CD.

4. Setting up your Development Environment

Clone the Repository:
git clone https://github.com/GoogleCloudPlatform/microservices-demo cd microservices-demo/

5. Running the Application Locally

You have two primary options for running the application locally:

Option 1: Google Kubernetes Engine (GKE):
- Create a GKE cluster.
- Enable Artifact Registry.
- Build and run with Skaffold:
- skaffold run --default-repo=us-docker.pkg.dev/YOUR_PROJECT_ID/microservices-demo
Option 2: Local Kubernetes Cluster (Minikube, Kind, Docker Desktop):
- Start a local Kubernetes cluster (Minikube, Kind, or Docker Desktop with Kubernetes enabled).
- Run skaffold run. Alternatively, use skaffold dev for automatic rebuilds on code changes.
- Port-forward the frontend service:
- kubectl port-forward deployment/frontend 8080:8080
- Access the application at localhost:8080.

6. Key Development Tasks

Making Code Changes:
1. Modify the source code in the src/ directory of the relevant microservice.
2. If you change a .proto file, run the appropriate genproto.sh script in the service directory to regenerate the gRPC code.
3. For local development with Skaffold in dev mode, changes are automatically rebuilt and deployed.
Adding a New Microservice: See docs/adding-new-microservice.md for detailed instructions. The general steps are:
1. Create a new directory under src/.
2. Add source code, including a Dockerfile.
3. Create Kubernetes manifests in kustomize/components/.
4. Update the root kustomization.yaml.
5. Update the root skaffold.yaml.
6. Update the Helm chart (helm-chart/)
Customizing Deployments with Kustomize:
- The kustomize/ directory provides components to customize the Online Boutique deployment (e.g., enabling Google Cloud Operations, using Memorystore, or enabling Istio).
- Use kubectl apply -k kustomize/ to apply the customizations defined in the kustomize/kustomization.yaml file.
- See kustomize/README.md for more details on using Kustomize.

7. Important Files for Development

skaffold.yaml: This is your main configuration file for building, testing, and deploying the application. Pay close attention to the artifacts section for image building and the manifests section for deployment.
kubernetes-manifests/: This directory contains the basic Kubernetes deployment and service definitions.
kustomize/: This directory allows you to customize your Online Boutique deployment, leveraging variations through Kustomize components.

8. Style Guides & Conventions

Follow the coding style and conventions used within each microservice. For example, use tabs for indentation in Go files and spaces in Python.
Refer to .editorconfig for common settings.

9. Testing

Unit Tests: Each microservice should have unit tests to validate its functionality.
Deployment Tests: Github actions validates all pods start up and become ready and runs basic smoke tests against the application when a pull request is made.
Adding Tests: Write automated tests, especially for new features or bug fixes.

10. Releasing a New Version

See the docs/releasing/README.md document for details on the release process. This involves using scripts to build and push images, update manifests, and tag the release.

11. CI/CD

The .github/workflows/ directory contains the GitHub Actions workflow definitions that automate the build, test, and deployment processes. Take a look at the workflow definitions to understand the CI/CD pipeline.

ci-pr.yaml: Runs code and deployment tests on every pull request.
ci-main.yaml: Runs on pushes to the main branch; builds and deploys images to Google Container Registry.
cleanup.yaml: Deletes namespaces in test clusters after PRs are closed.
kustomize-build-ci.yaml: Tests the kustomize build for the base and test configurations.
terraform-validate-ci.yaml: Validates terraform syntax.

By following these steps, you should be able to set up your development environment, run the Online Boutique application locally, and start contributing to the project. Good luck!

5) 코드에서 버그 찾기

question = """
코드베이스에서 가장 심각한 문제 3가지를 찾으세요.
"""

question = """
    Find the top 3 most severe issues in the codebase.
"""

response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=question,
    config=GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)
display(Markdown(response.text))

ㄴ> 실행결과 :

Here's an analysis of the codebase to identify the top 3 most severe issues:

Lack of Proper Error Handling in gRPC Services:
- Issue: Many of the gRPC service implementations (e.g., in cartservice, checkoutservice, productcatalogservice, shippingservice) use panic for handling errors related to missing environment variables (see mustMapEnv function). Panicking in a gRPC service can lead to the entire server crashing, affecting all clients.
- Severity: High. This is a critical issue that can lead to service unavailability. It violates the principle of fault tolerance, which is crucial in a microservices architecture. A single misconfigured environment variable can bring down an entire service.
- Files: src/checkoutservice/main.go, src/frontend/main.go, src/productcatalogservice/server.go and others.
Reliance on redis:alpine without version pinning or custom image builds in Helm Chart:
- Issue: The Helm chart for the Online Boutique deploys the redis:alpine image directly from Docker Hub without any specific tag or version pinning, and in many scenarios. It also mixes images from the images.repository with a public one.
- Severity: High. This creates a significant security and stability risk. Without version pinning, a malicious or breaking change to the redis:alpine image on Docker Hub could automatically be deployed to the cluster, causing unpredictable behavior or introducing vulnerabilities. Even though the base kustomize also has this issue, the HELM chart is supposed to be better for the "advanced setup" of more secured scenarios.
- Files: helm-chart/templates/cartservice.yaml
Potential Security Issue with Default Session Handling:
- Issue: The ensureSessionID function in src/frontend/middleware.go uses a hardcoded session ID "12345678-1234-1234-1234-123456789123" when ENABLE_SINGLE_SHARED_SESSION is set to "true". This means all users would share the same session.
- Severity: High. While it may have been used to showcase interop across multiple frontend clusters, sharing a session ID violates basic security principles and privacy, potentially leading to data leakage, impersonation, and unauthorized access to user carts and other sensitive information.
- Files: src/frontend/middleware.go

Additional Considerations and Lower Severity Issues:

Missing Health Checks: Not all services implement gRPC health checks properly. Without it, it's harder to monitor liveness and readiness of these services.
Kustomize/Helm Component Misconfigurations: The code contains hints of advanced configurations and scenarios that are not easily tested and documented, that have potential misconfigurations. This can increase the complexity and difficulty of deploying advanced and secured scenarios.
Test Coverage: Unit tests primarily exist for Go and C# services. There is a lack of unit tests for Node.js and Python services. Incomplete test coverage makes it difficult to verify the correctness and security of those services.

Recommendations:

Implement Proper Error Handling: Replace panic with more robust error handling mechanisms, such as returning gRPC error codes.
Pin Image Versions: Always use specific tags or digests for Docker images to ensure reproducibility and prevent unexpected updates.
Review Session Handling: Remove or heavily restrict the use of the shared session feature.
Improve Test Coverage: Add unit tests for all services.

6) 코드베이스 요약
question = """
이 코드베이스를 요약하고, 이 코드베이스에서 배울 수 있는 가장 중요한 3가지를 알려주세요.
"""

question = """
  Give me a summary of this codebase, and tell me the top 3 things that I can learn from it.
"""

response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=question,
    config=GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)
display(Markdown(response.text))

ㄴ> 실행결과 :

Here's a summary of the provided codebase and the top things you can learn from it:

Codebase Summary

This codebase represents a complete microservices-based e-commerce application called "Online Boutique". It demonstrates how to build, deploy, and manage a cloud-native application using a variety of technologies. The application consists of 11 microservices written in different languages (Go, C#, Node.js, Python, Java) that communicate over gRPC. The infrastructure is designed to be deployed on Kubernetes, with configurations for Google Kubernetes Engine (GKE), Istio, and other platforms. The codebase includes:

<ul style="list-style-type: disc; background-color: #ffffff; color: #000000;

3-5) 동영상와 오디오를 한 번에 처리

ex) "오디오 입력이 있는 동영상" 인터리브 처리에서 Gemini의 네이티브 멀티모달 및 긴 컨텍스트 기능을 사용

video_url = (
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/video/pixel8.mp4"
)
display(Video(video_url, width=350))

prompt = """
동영상에 대한 자세한 설명을 제공하세요.
설명에는 동영상의 중요한 대사와 휴대전화의 주요 기능도 포함되어야 합니다.
"""

prompt = """
  Provide a detailed description of the video.
  The description should also contain any important dialogue from the video and key features of the phone.
"""

video = Part.from_uri(
    file_uri=video_url,
    mime_type="video/mp4",
)
contents = [prompt, video]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행 결과 : Here is a detailed description of the video:

The video opens with a dark aerial shot of a train station next to a building. The video shifts to Saeka Shimada, a female photographer based in Tokyo, who is walking through a city street and explaining that Tokyo has many faces. She mentions that the night-time atmosphere of Tokyo is very different from the daytime.

Saeka explains that the new Pixel phone has a feature called "Video Boost". This feature automatically activates "Night Sight" in low-light conditions to improve the video quality. The video cuts to Saeka holding up her light blue Pixel phone as she is about to take a photo in a narrow alleyway lit with hanging lanterns. She can then be seen recording in a different part of the alleyway where she is crouching next to a sewer and puddle of water.

Saeka says that the Sancha neighborhood is where she lived when she first moved to Tokyo and it is a place full of memories. She looks at the images and videos she recorded on her phone.

The video cuts to show a series of quick shots of the alleyway.

Saeka explains that she has arrived in the Shibuya neighborhood. As she speaks, she passes a group of people who are walking over a glass-paneled pedestrian bridge.

3-6) 모든 멀티모달 양식(이미지, 비디오, 오디오, 텍스트)을 한 번에 처리

- 오디오, 시각, 텍스트, 코드 입력을 동일한 입력 순서로 혼합하여 지원할 수 있다

video_url = "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4"
display(Video(video_url.replace("gs://", "https://storage.googleapis.com/"), width=350))

image_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/a-man-and-a-dog.png"
display(Image(image_url, width=350))

prompt = """
동영상의 각 프레임을 주의 깊게 살펴보고 질문에 답하십시오.
답변은 다음 내용만 기반으로 작성하십시오. 첨부된 비디오에서 제공되는 정보에만 엄격하게 집중하세요.
비디오에 없는 정보는 조작하지 마시고, 너무 장황하게 설명하지 마세요.

직설적으로 설명하세요.

질문:
- 이미지 속 장면은 비디오에서 언제 발생하나요? 타임스탬프를 제공하세요.
- 그 장면의 맥락은 무엇이며, 내레이터는 그 장면에 대해 무엇을 말하나요?
"""

prompt = """
  Look through each frame in the video carefully and answer the questions.
  Only base your answers strictly on what information is available in the video attached.
  Do not make up any information that is not part of the video and do not be too
  verbose, be straightforward.

  Questions:
  - When is the moment in the image happening in the video? Provide a timestamp.
  - What is the context of the moment and what does the narrator say about it?
"""

contents = [
    prompt,
    Part.from_uri(file_uri=video_url, mime_type="video/mp4"),
    Part.from_uri(file_uri=image_url, mime_type="image/png"),
]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행 결과 :

Sure, here are the answers to your questions about the video:

At what time stamp is the image happening in the video? Provide a time stamp. 0:48
What is the context of the moment and what does the narrator say about it? The story is about a blind man and his girlfriend and we follow them on their journey together and growing closer.

4. Use Case 살펴보기

- Use Case: 소매/전자상거래 (retail / e-commerce)

- ex) 고객이 거실을 보여주고 적절한 가구를 찾고, 방에 어울리는 네 가지 벽 장식 옵션 중에서 선택하려 한다고 가정

4-1. 제공된 이미지를 기반으로 추천 받기

- Gemini는 이미지 비교와 추천 제공이 가능합니다.

- 이 방식은 특히 소매업체에서 사용자의 현재 설정을 기반으로 제품을 추천하고 싶을 때 유용

- ex1) 같은 이미지를 사용하여 모델에게 공간에 어울리는 가구를 추천해 달라고 요청하기
이 때, 모델은 어떤 가구든 선택할 수 있으며, 이는 모델이 내장한 지식을 통해서만 가능

prompt = "이 방에 대해 설명해 주세요."

room_image_url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/living-room.png"
display(Image(room_image_url, width=350))

room_image = Part.from_uri(file_uri=room_image_url, mime_type="image/png")

prompt = "Describe this room"
contents = [prompt, room_image]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행 결과

The room is a contemporary living space with a warm and inviting ambiance. The color palette is dominated by neutral tones, creating a sense of calmness and sophistication.

Against a plain, light beige wall, a large, comfortable caramel-colored sectional sofa takes center stage. It has a soft, textured appearance, suggesting it's made of leather or suede. A round coffee table with a unique, sculptural base sits in front of the sofa, displaying a book and a decorative object.

Two armchairs flank the coffee table. One armchair to the left appears to have a textured or fringed back, while the other armchair to the right has a more streamlined, modern design. Both chairs are upholstered in a similar neutral shade that complements the sofa.

A tall tripod floor lamp with a black shade stands to the right of the sofa, providing soft lighting. A large window with flowing neutral-colored curtains is visible on the left, allowing natural light to filter into the room.

The flooring is light and appears to have a subtle marble pattern. A soft, light-colored rug covers the central area, grounding the furniture and adding to the room's cozy feel.

Overall, the room exudes a sense of understated elegance and comfort, with a focus on textures and natural light.

----

(번역 결과)

이 방은 따뜻하고 매력적인 분위기를 자아내는 현대적인 거실 공간입니다. 색상 팔레트는 뉴트럴 톤이 주를 이루어 차분함과 세련미를 더합니다.

단순하고 밝은 베이지색 벽을 배경으로, 크고 편안한 캐러멜 색상의 섹셔널 소파가 중앙에 자리 잡고 있습니다. 부드러운 질감이 가죽이나 스웨이드 소재임을 짐작하게 하는 소파입니다. 독특한 조각품 같은 받침대가 있는 원형 커피 테이블이 소파 앞에 놓여 있으며, 책과 장식품이 놓여 있습니다.

커피 테이블 양옆에는 두 개의 안락의자가 놓여 있습니다. 왼쪽에 있는 안락의자는 등받이에 질감이나 프린지 장식이 있는 것처럼 보이고, 오른쪽에 있는 다른 안락의자는 더욱 간결하고 현대적인 디자인을 자랑합니다. 두 의자 모두 소파와 잘 어울리는 뉴트럴 톤의 커버로 덮여 있습니다.

소파 오른쪽에는 검은색 전등갓이 달린 높은 삼각대 스탠드가 있어 은은한 조명을 제공합니다. 왼쪽에는 뉴트럴 톤의 커튼이 드리워진 큰 창문이 있어 자연광이 실내로 스며듭니다.

바닥은 밝고 은은한 대리석 무늬가 돋보입니다. 부드럽고 밝은 색상의 러그가 중앙 공간을 덮고 있어 가구와 조화를 이루며 객실에 아늑한 분위기를 더합니다.

전반적으로, 객실은 질감과 자연광에 중점을 두고 절제된 우아함과 편안함을 자아냅니다.

-ex2)

prompt1 = "이 방에 맞는 새 가구를 추천해 주세요."
prompt2 = "이유를 자세히 설명해 주세요."

prompt1 = "Recommend a new piece of furniture for this room"
prompt2 = "Explain the reason in detail"
contents = [prompt1, room_image, prompt2]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

ㄴ> 실행결과 :

Okay, based on the image and the existing furniture, I recommend adding a long, low sideboard or console table against the large blank wall behind the sofa.

Here's the reasoning:

Balance the Space: The room currently feels a bit bottom-heavy. The sofa is large and anchors the lower portion of the room, but the wall above is completely empty. A sideboard or console table would visually fill some of that space and balance the overall composition.
Functionality: A sideboard/console would provide practical storage space, which is always welcome in a living room. It could house books, blankets, media equipment, or decorative items.
Style Compatibility: Given the warm neutral color palette, the natural textures (wood, leather, stone-like coffee table base), I'd suggest a sideboard made of natural wood (perhaps a lighter tone oak or a reclaimed wood) to complement the existing style. A piece with clean lines and minimal hardware would fit best with the modern aesthetic.

Why a sideboard/console and not something else?

Not another seating piece: The room already has a large sofa and two armchairs, so adding another chair could make the space feel crowded.
Not shelves/bookshelves: While shelves could work, a sideboard offers closed storage, which helps maintain a clean and uncluttered look. Also, open shelves can sometimes make a room feel visually busy if not styled very carefully.
Not a large piece of art: While art would definitely enhance the room, adding the sideboard first offers the benefit of placing a smaller piece of art above it, which tends to be easier. It would also not limit the selection of art to one larger statement piece.

Adding a sideboard/console will improve the space in this manner.

-ex3)

# 선택 항목에서 항목을 추천하려면 프롬프트에 항목 번호에 라벨을 붙여야 합니다.
# 이렇게 하면 질문을 던질 때 모델이 각 이미지를 참조할 수 있는 방법을 제공할 수 있습니다.
# 프롬프트에 이미지에 라벨을 붙이면 환각 현상을 줄이고 더 나은 결과를 얻는 데에도 도움이 됩니다.
prompt = """
인테리어 디자이너라고 가정해 보겠습니다.
각 벽 장식에 대해 방의 스타일에 적합한지 설명하세요.
각 작품이 방과 얼마나 잘 어울리는지 순위를 매겨 보세요.
"""

art_image_urls = [
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-1.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-2.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-3.png",
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/room-art-4.png",
]

md_content = f"""
|Customer photo |
|:-----:|
| <img src="{room_image_url}" width="50%"> |

|Art 1| Art 2 | Art 3 | Art 4 |
|:-----:|:----:|:-----:|:----:|
| <img src="{art_image_urls[0]}" width="60%">|<img src="{art_image_urls[1]}" width="100%">|<img src="{art_image_urls[2]}" width="60%">|<img src="{art_image_urls[3]}" width="60%">|
"""

display(Markdown(md_content))

# Load wall art images as Part objects
art_images = [
    Part.from_uri(file_uri=url, mime_type="image/png") for url in art_image_urls
]

# To recommend an item from a selection, you will need to label the item number within the prompt.
# That way you are providing the model with a way to reference each image as you pose a question.
# Labeling images within your prompt also helps reduce hallucinations and produce better results.
prompt = """
  You are an interior designer.
  For each piece of wall art, explain whether it would be appropriate for the style of the room.
  Rank each piece according to how well it would be compatible in the room.
"""

contents = [
    "Consider the following art pieces:",
    "art 1:",
    art_images[0],
    "art 2:",
    art_images[1],
    "art 3:",
    art_images[2],
    "art 4:",
    art_images[3],
    "room:",
    room_image,
    prompt,
]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

5. Diagrams (기술 다이어그램에서 엔티티 관계 이해)

- Gemini는 다이어그램을 이해하고 최적화 또는 코드 생성과 같은 실행 가능한 조치를 취할 수 있는 다중 모드 기능을 제공합니다.

- ex) Gemini가 엔티티 관계(ER) 다이어그램을 해석하고, 테이블 간의 관계를 이해하고, BigQuery와 같은 특정 환경에서 최적화 요구 사항을 파악하고, 심지어 해당 코드를 생성하는 방법을 보여줍니다.

prompt = "이 ER 다이어그램의 엔터티와 관계를 문서화하세요."

image_er_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/er.png"
display(Image(image_er_url, width=350))

prompt = "Document the entities and relationships in this ER diagram."

contents = [prompt, Part.from_uri(file_uri=image_er_url, mime_type="image/png")]

# Use a more deterministic configuration with a low temperature
config = GenerateContentConfig(
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    candidate_count=1,
    max_output_tokens=8192,
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents=contents,
    config=config,
)
display(Markdown(response.text))

ㄴ> 실행결과 :

6. Compare images (이미지를 비교하여 유사점과 차이점을 파악)

- ex) 독일 뮌헨의 마리엔플라츠에서 약간씩 다른 두 장면을 보여주고 유사점과 차이점 분석하기

prompt1 = """
다음 두 이미지를 고려해 보세요.
이미지 1:
"""
prompt2 = """
이미지 2:
"""
prompt3 = """
1. 이미지 1에는 무엇이 표시되어 있습니까? 어디에 있습니까?
2. 두 이미지의 유사점은 무엇입니까?
3. 이미지 1과 이미지 2의 내용이나 인물 측면에서 차이점은 무엇입니까?
"""

image_landmark1_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/landmark1.jpg"
image_landmark2_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/landmark2.jpg"

md_content = f"""
| Image 1 | Image 2 |
|:-----:|:----:|
| <img src="{image_landmark1_url}" width="350"> | <img src="{image_landmark2_url}" width="350"> |
"""

display(Markdown(md_content))

prompt1 = """
Consider the following two images:
Image 1:
"""
prompt2 = """
Image 2:
"""
prompt3 = """
1. What is shown in Image 1? Where is it?
2. What is similar between the two images?
3. What is difference between Image 1 and Image 2 in terms of the contents or people shown?
"""

contents = [
    prompt1,
    Part.from_uri(file_uri=image_landmark1_url, mime_type="image/jpeg"),
    prompt2,
    Part.from_uri(file_uri=image_landmark2_url, mime_type="image/jpeg"),
    prompt3,
]

config = GenerateContentConfig(
    temperature=0.0,
    top_p=0.8,
    top_k=40,
    candidate_count=1,
    max_output_tokens=2048,
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents=contents,
    config=config,
)
display(Markdown(response.text))

ㄴ> 실행결과 :

Image 1Image 2

Here's an analysis of the two images:
1. What is shown in Image 1? Where is it?
Image 1 shows the Odeonsplatz in Munich, Germany. Key landmarks visible are:

Feldherrnhalle: The large open hall with arches in the center of the image.
Residenz: The large building with the red roof on the left side of the image.
Theatinerkirche: The yellow building on the right side of the image.

2. What is similar between the two images?
The two images are very similar. They show the same location (Odeonsplatz in Munich) from approximately the same angle. The buildings, the layout of the square, and the overall composition are nearly identical.
3. What is the difference between Image 1 and Image 2 in terms of the contents or people shown?
The only noticeable difference between the two images is the presence of a person in a black coat in the foreground of Image 1. This person is not present in Image 2. The rest of the people and contents of the images are the same.

출처

https://www.ibm.com/kr-ko/think/topics/multimodal-ai

멀티모달 AI란 무엇인가요? | IBM

멀티모달 AI는 다양한 모달리티 또는 데이터 유형의 정보를 처리하고 통합할 수 있는 AI 시스템을 말합니다. 이러한 모달리티에는 텍스트, 이미지, 오디오, 비디오 또는 기타 형태의 감각적 인풋

www.ibm.com

https://ai.google.dev/api/caching?hl=ko#Content

Caching | Gemini API | Google AI for Developers

Live API에 새로운 네이티브 오디오 모델을 사용할 수 있습니다. 자세히 알아보기 이 페이지는 Cloud Translation API를 통해 번역되었습니다. 의견 보내기 Caching 컨텍스트 캐싱을 사용하면 동일한 미디

ai.google.dev

https://ai.google.dev/api/generate-content

Generating content | Gemini API | Google AI for Developers

Live API에 새로운 네이티브 오디오 모델을 사용할 수 있습니다. 자세히 알아보기 이 페이지는 Cloud Translation API를 통해 번역되었습니다. 의견 보내기 Generating content Gemini API는 이미지, 오디오, 코드

ai.google.dev

https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview

컨텍스트 캐싱 개요 | Generative AI on Vertex AI | Google Cloud

의견 보내기 컨텍스트 캐싱 개요 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요. 컨텍스트 캐싱은 반복 콘텐츠가 포함된 Gemini 요청의 비용과 지연 시간을 줄

cloud.google.com

https://www.cloudskillsboost.google/course_templates/981/labs/550041

Inspect Rich Documents with Gemini Multimodality and Multimodal RAG - Gemini의 멀티모달리티 | Google Cloud Skills Boost

이 실습에서는 결합된 텍스트, 이미지, 기타 데이터 유형을 이해하고 처리하는 Gemini의 기능을 다양한 실제 시나리오에서 적용하는 방법을 알아봅니다.

www.cloudskillsboost.google

저작자표시 비영리 (새창열림)

현재글[RAG 1/4] 1. Multimodality with Gemini (Gemini의 멀티모달리티)

android compose, git 원격, gemini code, SOLID 원칙, rxjava, 안드로이드 build 설정, 안드로이드 컴포즈, git commit, SOLID 규칙, google skill boost imagen, 안드로이드 코루틴, git branch, Android Databinding, ai rag, gemini rag, google skill boost, 안드로이드 build gradle, git 브랜치, 코루틴, google skill boost image,

Today :
Yesterday :

nemo's dev memos

[RAG 1/4] 1. Multimodality with Gemini (Gemini의 멀티모달리티)

1. Multimodality with Gemini

(Gemini의 멀티모달리티)

멀티모달이란?

멀티모달 AI란 무엇인가요?

개요

목표

실습 셋팅과정들인

👆 들은 모두 여기에 설명되어 있습니다.

실습시작

1. Getting Started

2. 'Task 3. Use the Gemini Flash model'

3. Individual Modalities

3-1) Text 이해하기

3-2) Document Summarization (문서 요약)

3-3) Image understanding across multiple images

3-4) Video description (동영상 설명 생성)

3-5) Audio (오디오 분석, 대사추출)

3-5) codebase (코드베이스 전반에서 추론)

Gemini 의 컨텍스트 캐싱(Context Caching)이란?

3-5) 동영상와 오디오를 한 번에 처리

3-6) 모든 멀티모달 양식(이미지, 비디오, 오디오, 텍스트)을 한 번에 처리

4. Use Case 살펴보기

4-1. 제공된 이미지를 기반으로 추천 받기

5. Diagrams (기술 다이어그램에서 엔티티 관계 이해)

6. Compare images (이미지를 비교하여 유사점과 차이점을 파악)

출처

'Study Programming/Google AI기초공부(with CloudSkillBoost)'의 다른글

티스토리툴바

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

[RAG 1/4] 1. Multimodality with Gemini (Gemini의 멀티모달리티)

1. Multimodality with Gemini

(Gemini의 멀티모달리티)

멀티모달이란?

멀티모달 AI란 무엇인가요?

개요

목표

실습 셋팅과정들인

👆 들은 모두 여기에 설명되어 있습니다.

실습시작

1. Getting Started

2. 'Task 3. Use the Gemini Flash model'

3. Individual Modalities

3-1) Text 이해하기

3-2) Document Summarization (문서 요약)

3-3) Image understanding across multiple images

3-4) Video description (동영상 설명 생성)

3-5) Audio (오디오 분석, 대사추출)

3-5) codebase (코드베이스 전반에서 추론)

Gemini 의 컨텍스트 캐싱(Context Caching)이란?

3-5) 동영상와 오디오를 한 번에 처리

3-6) 모든 멀티모달 양식(이미지, 비디오, 오디오, 텍스트)을 한 번에 처리

4. Use Case 살펴보기

4-1. 제공된 이미지를 기반으로 추천 받기

5. Diagrams (기술 다이어그램에서 엔티티 관계 이해)

6. Compare images (이미지를 비교하여 유사점과 차이점을 파악)

출처

'Study Programming/Google AI기초공부(with CloudSkillBoost)'의 다른글

관련글

티스토리툴바