Add image description for grounding by aubrypaul · Pull Request #53 · scriptit-fr/GenAIApp

aubrypaul · 2026-02-06T13:04:14Z

No description provided.

coderabbitai · 2026-02-06T13:04:38Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Gemini Vision support: images are analyzed to produce text summaries before retrieval (RAG).
- Public API to customize the vision prompt used for image descriptions.
Bug Fixes
- Improved image MIME type detection and clearer errors for unsupported formats.
- Ensured image-to-text conversion runs before retrieval and standardized image payload handling.

Walkthrough

Added Gemini Vision scaffolding and an image-to-text conversion flow in src/code.gs: MIME type inference for images, replaced inline_data with inlineData, added _convertImagesToText(currentContents) (invoked before RAG when vector stores exist), top-level vision constants, and a public setPromptForVision(prompt) API.

Changes

Cohort / File(s)	Summary
Image Handling & Conversion `src/code.gs`	Added MIME type inference in image handling, replaced `inline_data` with `inlineData`, implemented `_convertImagesToText(currentContents)` to detect image parts (inlineData/fileData), call Gemini Vision, replace image parts with text messages, and integrate this step into the run flow when vector stores are present. Duplicate `_convertImagesToText` declarations noted.
Vision Model & Prompt Constants `src/code.gs`	Introduced `modelForVision` (`gemini-3-pro-preview`) and `promptForVision` (default prompt) constants and a public `setPromptForVision(prompt)` method to customize the vision prompt.
Payload / Gemini Integration `src/code.gs`	Switched Gemini payload field names from `inline_data` to `inlineData`, added image MIME resolution and errors for unsupported formats, and adjusted payload constructions for image/file parts.

Sequence Diagram

sequenceDiagram
    participant User as User/Client
    participant App as GenAIApp / Chat
    participant Vision as Gemini Vision
    participant RAG as RAG System
    participant API as Gemini/OpenAI API

    User->>App: run() with messages containing images and RAG enabled
    App->>App: Detect image parts (inlineData/fileData)
    App->>Vision: Send image bytes + vision prompt to modelForVision
    Vision-->>App: Return textual image analysis
    App->>App: Remove image parts and append analysis as new user message
    App->>RAG: Send text-only contents for retrieval/augmentation
    RAG->>API: Query external model/store for context
    API-->>RAG: Return retrieved context
    RAG-->>App: Provide augmented context
    App-->>User: Return final response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether the description relates to the changeset.	Add a pull request description explaining the purpose of the image description feature and how it improves grounding in the GenAI application.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately reflects the main change: adding image description functionality for grounding, which aligns with the image-to-text conversion flow and Gemini Vision integration.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vision-description-for-grounding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/code.gs (1)

130-140: ⚠️ Potential issue | 🔴 Critical

Inconsistent property name: inlineData here vs inline_data in addFile() (line 213).

addImage() uses inlineData (camelCase), but addFile() uses inline_data (snake_case). This causes _convertImagesToText() (line 782) to silently skip images added via addFile(), since it only checks p.inlineData || p.fileData.

Both methods push to the same contents array sent to the Gemini API. Align both to use inlineData, or update _convertImagesToText() to also check inline_data and file_data.

🤖 Fix all issues with AI agents

In `@src/code.gs`:
- Around line 798-810: The hardcoded user prompt inside the descriptionPayload
object biases image analysis to "technical support request"; make this prompt
configurable or neutral by replacing the fixed text in
descriptionPayload.contents[0].parts (where imageParts are spread) with a
parameter or a default general-purpose string (e.g., request-specific prompt
passed into the calling function or a neutral prompt like "Describe the images,
transcribe any visible text, and summarize the visual context.") so callers can
supply domain-specific prompts; preserve the existing generationConfig and
ensure the code still merges imageParts before appending the configurable
prompt.
- Around line 440-445: The code calls this._convertImagesToText(...) whenever
model.includes("gemini") and ragCorpusIds exist, but _convertImagesToText
currently always builds a Vertex AI URL using gcpProjectId which can be empty
when Gemini is configured via setGeminiAPIKey(); update the guard here to only
call _convertImagesToText when either gcpProjectId is non-empty (Vertex AI) or
geminiKey is set (API-key path), or else adjust _convertImagesToText to detect
geminiKey and construct the appropriate Generative Language API endpoint
similarly to the logic used around the model/key handling at lines 462–477;
reference functions/vars: _convertImagesToText, model.includes("gemini"),
gcpProjectId, geminiKey, setGeminiAPIKey.
- Around line 787-789: The check "typeof verbose !== 'undefined' && verbose" is
redundant because verbose is always defined in the GenAIApp IIFE scope; replace
that condition with the simpler "if (verbose)" in the block that logs the
image-to-text message (the console.log inside the image detection branch) to
match other uses of the verbose variable.
- Around line 838-841: The current filter inside newContents.forEach (which
iterates c.parts) only removes parts with camelCase properties
inlineData/fileData and misses snake_case inline_data/file_data used by
addFile(), so update the predicate in the c.parts = parts.filter(...) call to
exclude parts that have any of inlineData, fileData, inline_data or file_data;
also audit addFile()/addImage() usages and prefer unifying on one property name
(e.g., inlineData/fileData) to avoid future mismatches.
- Around line 116-129: The extension matching fails when imageInput is a URL
with query params or fragments; update the MIME-type-detection block (where
mimeType, imageInput, and lower are used) to parse imageInput as a URL first
(using new URL(imageInput) in a try/catch), use url.pathname (or fallback to
imageInput) and run the endsWith checks against that pathname
(png/jpg/jpeg/webp/gif) before falling back to throwing the Error; keep existing
behavior for non-URL inputs and ensure the URL parse errors are handled
gracefully so local filenames still work.
- Around line 825-835: The block that calls UrlFetchApp.fetch and JSON.parse
inside run() can throw and should be wrapped in a try/catch so failures don’t
crash run(); surround the fetch, JSON.parse and the result->description
extraction (references: UrlFetchApp.fetch, JSON.parse, result, description) with
a try/catch, on success keep the existing candidate/parts logic, and on any
error set description to the existing fallback ("Image analysis returned no
text.") and log the error (e.g., Logger.log or console.error) for debugging;
ensure the catch does not rethrow so run() continues gracefully.
- Around line 822-823: The code hardcodes modelForVision
("gemini-3-pro-preview") and uses a Vertex AI-only endpoint string for
generateContent; make the model name configurable (or promote modelForVision to
a module-level named constant) and change endpoint construction to support both
Vertex AI and Generative Language API paths depending on auth: if geminiKey is
present use the Generative Language API endpoint and include the API key in
options.headers, otherwise use the Vertex AI endpoint with OAuth; mirror the
auth branching logic used in _callGenAIApi to set options.headers appropriately
(refer to modelForVision, endpoint, options.headers, geminiKey, and
_callGenAIApi to locate and implement the changes).

src/code.gs

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/code.gs (1)
138-148: ⚠️ Potential issue | 🟠 Major

Change mime_type to mimeType in both addImage() and addFile() methods.

The Gemini API REST endpoint expects mimeType (camelCase) inside inlineData, not mime_type. While the wrapper property was correctly updated to inlineData, the field name must also use camelCase to match the official Gemini REST API specification.
🔧 Proposed fix
              inlineData: {
-                 mime_type: mimeType,
+                 mimeType: mimeType,
                  data: base64Image
              }
Line 222 in addFile():
            inlineData: {
-             mime_type: fileInfo.mimeType,
+             mimeType: fileInfo.mimeType,
              data: blobToBase64
            }

🤖 Fix all issues with AI agents

In `@src/code.gs`:
- Around line 2395-2397: The setter setPromptForVision currently assigns prompt
directly without validation; ensure prompt is a non-empty string before
assigning to promptForVision (used later in the Gemini API payload around line
811). Add a guard in setPromptForVision that checks typeof prompt === "string"
and prompt.trim().length > 0; if valid, assign promptForVision = prompt.trim(),
otherwise either throw a clear error or ignore the assignment and log a warning
so invalid values (null/undefined/non-strings) are never sent to the Gemini API.
- Around line 788-804: Redundant unreachable guard: remove the imageParts.length
=== 0 check because hasImages already guaranteed images; update the block around
the hasImages and imageParts computations (references: hasImages, imageParts,
currentContents, verbose) by deleting the final conditional that returns
currentContents when imageParts.length === 0, leaving the early return on
!hasImages and continuing with imageParts processing; ensure no other logic
depended on that second guard.
- Around line 848-857: The code handling message parts is inconsistent: in the
newContents.forEach block you use "const parts = Array.isArray(c.parts) ?
c.parts : [c.parts];" which can yield [null] or [undefined], while later you
safely use "c.parts ? [c.parts] : []". Update the forEach in the newContents
transformation (the block that assigns c.parts = parts.filter(...)) to use the
same null-guard pattern — i.e., replace the fallback [c.parts] with c.parts ?
[c.parts] : [] — so both places consistently treat null/undefined parts and
avoid creating arrays containing null/undefined before filtering.
- Around line 830-846: There is a duplicate, unprotected API call: the initial
UrlFetchApp.fetch + JSON.parse for variables response/result should be removed
so only the fetch inside the try/catch runs; keep the endpoint and options
usage, parse the response inside the try block (using the existing result
variable), and ensure description is assigned from result.candidates/... or
result.parts/... as currently written; also remove the redundant const
declarations outside the try and avoid shadowing response/result so the
Logger.log in the catch will handle failures.
- Around line 844-846: The catch block that currently calls Logger.log in the
Gemini Vision preprocessing code should be changed to use console.warn to match
the project's logging conventions; locate the catch handling for "Image analysis
failed during Gemini Vision preprocessing" (where Logger.log is called) and
replace the Logger.log call with console.warn(`[GenAIApp] - Image analysis
failed during Gemini Vision preprocessing: ${error}`) so warnings use
console.warn consistently with other parts of the codebase.
- Around line 2382-2397: The object literal is missing a comma after the closing
brace of setPrivateInstanceBaseUrl which breaks parsing; add a trailing comma
immediately after the brace that ends setPrivateInstanceBaseUrl to separate it
from the next property (setPromptForVision) and ensure the object’s properties
are properly comma-separated.

src/code.gs

Add image description

494f483

aubrypaul requested a review from Benjamin-Sayaque February 6, 2026 13:04

coderabbitai bot requested changes Feb 6, 2026

View reviewed changes

aubrypaul added 3 commits February 6, 2026 14:46

Fix

fa90238

Fix inlineData

5294542

Fix prompt

af3cb65

coderabbitai bot requested changes Feb 6, 2026

View reviewed changes

src/code.gs Outdated Show resolved Hide resolved

src/code.gs Show resolved Hide resolved

src/code.gs Show resolved Hide resolved

src/code.gs Show resolved Hide resolved

src/code.gs Show resolved Hide resolved

src/code.gs Show resolved Hide resolved

Fix

7c3dcc7

coderabbitai bot approved these changes Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image description for grounding#53

Add image description for grounding#53
aubrypaul wants to merge 5 commits intomainfrom
vision-description-for-grounding

aubrypaul commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aubrypaul commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 6, 2026 •

edited

Loading