Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdded Gemini Vision scaffolding and an image-to-text conversion flow in Changes
Sequence DiagramsequenceDiagram
participant User as User/Client
participant App as GenAIApp / Chat
participant Vision as Gemini Vision
participant RAG as RAG System
participant API as Gemini/OpenAI API
User->>App: run() with messages containing images and RAG enabled
App->>App: Detect image parts (inlineData/fileData)
App->>Vision: Send image bytes + vision prompt to modelForVision
Vision-->>App: Return textual image analysis
App->>App: Remove image parts and append analysis as new user message
App->>RAG: Send text-only contents for retrieval/augmentation
RAG->>API: Query external model/store for context
API-->>RAG: Return retrieved context
RAG-->>App: Provide augmented context
App-->>User: Return final response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/code.gs (1)
130-140:⚠️ Potential issue | 🔴 CriticalInconsistent property name:
inlineDatahere vsinline_datainaddFile()(line 213).
addImage()usesinlineData(camelCase), butaddFile()usesinline_data(snake_case). This causes_convertImagesToText()(line 782) to silently skip images added viaaddFile(), since it only checksp.inlineData || p.fileData.Both methods push to the same
contentsarray sent to the Gemini API. Align both to useinlineData, or update_convertImagesToText()to also checkinline_dataandfile_data.
🤖 Fix all issues with AI agents
In `@src/code.gs`:
- Around line 798-810: The hardcoded user prompt inside the descriptionPayload
object biases image analysis to "technical support request"; make this prompt
configurable or neutral by replacing the fixed text in
descriptionPayload.contents[0].parts (where imageParts are spread) with a
parameter or a default general-purpose string (e.g., request-specific prompt
passed into the calling function or a neutral prompt like "Describe the images,
transcribe any visible text, and summarize the visual context.") so callers can
supply domain-specific prompts; preserve the existing generationConfig and
ensure the code still merges imageParts before appending the configurable
prompt.
- Around line 440-445: The code calls this._convertImagesToText(...) whenever
model.includes("gemini") and ragCorpusIds exist, but _convertImagesToText
currently always builds a Vertex AI URL using gcpProjectId which can be empty
when Gemini is configured via setGeminiAPIKey(); update the guard here to only
call _convertImagesToText when either gcpProjectId is non-empty (Vertex AI) or
geminiKey is set (API-key path), or else adjust _convertImagesToText to detect
geminiKey and construct the appropriate Generative Language API endpoint
similarly to the logic used around the model/key handling at lines 462–477;
reference functions/vars: _convertImagesToText, model.includes("gemini"),
gcpProjectId, geminiKey, setGeminiAPIKey.
- Around line 787-789: The check "typeof verbose !== 'undefined' && verbose" is
redundant because verbose is always defined in the GenAIApp IIFE scope; replace
that condition with the simpler "if (verbose)" in the block that logs the
image-to-text message (the console.log inside the image detection branch) to
match other uses of the verbose variable.
- Around line 838-841: The current filter inside newContents.forEach (which
iterates c.parts) only removes parts with camelCase properties
inlineData/fileData and misses snake_case inline_data/file_data used by
addFile(), so update the predicate in the c.parts = parts.filter(...) call to
exclude parts that have any of inlineData, fileData, inline_data or file_data;
also audit addFile()/addImage() usages and prefer unifying on one property name
(e.g., inlineData/fileData) to avoid future mismatches.
- Around line 116-129: The extension matching fails when imageInput is a URL
with query params or fragments; update the MIME-type-detection block (where
mimeType, imageInput, and lower are used) to parse imageInput as a URL first
(using new URL(imageInput) in a try/catch), use url.pathname (or fallback to
imageInput) and run the endsWith checks against that pathname
(png/jpg/jpeg/webp/gif) before falling back to throwing the Error; keep existing
behavior for non-URL inputs and ensure the URL parse errors are handled
gracefully so local filenames still work.
- Around line 825-835: The block that calls UrlFetchApp.fetch and JSON.parse
inside run() can throw and should be wrapped in a try/catch so failures don’t
crash run(); surround the fetch, JSON.parse and the result->description
extraction (references: UrlFetchApp.fetch, JSON.parse, result, description) with
a try/catch, on success keep the existing candidate/parts logic, and on any
error set description to the existing fallback ("Image analysis returned no
text.") and log the error (e.g., Logger.log or console.error) for debugging;
ensure the catch does not rethrow so run() continues gracefully.
- Around line 822-823: The code hardcodes modelForVision
("gemini-3-pro-preview") and uses a Vertex AI-only endpoint string for
generateContent; make the model name configurable (or promote modelForVision to
a module-level named constant) and change endpoint construction to support both
Vertex AI and Generative Language API paths depending on auth: if geminiKey is
present use the Generative Language API endpoint and include the API key in
options.headers, otherwise use the Vertex AI endpoint with OAuth; mirror the
auth branching logic used in _callGenAIApi to set options.headers appropriately
(refer to modelForVision, endpoint, options.headers, geminiKey, and
_callGenAIApi to locate and implement the changes).
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/code.gs (1)
138-148:⚠️ Potential issue | 🟠 MajorChange
mime_typetomimeTypein bothaddImage()andaddFile()methods.The Gemini API REST endpoint expects
mimeType(camelCase) insideinlineData, notmime_type. While the wrapper property was correctly updated toinlineData, the field name must also use camelCase to match the official Gemini REST API specification.🔧 Proposed fix
inlineData: { - mime_type: mimeType, + mimeType: mimeType, data: base64Image }Line 222 in
addFile():inlineData: { - mime_type: fileInfo.mimeType, + mimeType: fileInfo.mimeType, data: blobToBase64 }
🤖 Fix all issues with AI agents
In `@src/code.gs`:
- Around line 2395-2397: The setter setPromptForVision currently assigns prompt
directly without validation; ensure prompt is a non-empty string before
assigning to promptForVision (used later in the Gemini API payload around line
811). Add a guard in setPromptForVision that checks typeof prompt === "string"
and prompt.trim().length > 0; if valid, assign promptForVision = prompt.trim(),
otherwise either throw a clear error or ignore the assignment and log a warning
so invalid values (null/undefined/non-strings) are never sent to the Gemini API.
- Around line 788-804: Redundant unreachable guard: remove the imageParts.length
=== 0 check because hasImages already guaranteed images; update the block around
the hasImages and imageParts computations (references: hasImages, imageParts,
currentContents, verbose) by deleting the final conditional that returns
currentContents when imageParts.length === 0, leaving the early return on
!hasImages and continuing with imageParts processing; ensure no other logic
depended on that second guard.
- Around line 848-857: The code handling message parts is inconsistent: in the
newContents.forEach block you use "const parts = Array.isArray(c.parts) ?
c.parts : [c.parts];" which can yield [null] or [undefined], while later you
safely use "c.parts ? [c.parts] : []". Update the forEach in the newContents
transformation (the block that assigns c.parts = parts.filter(...)) to use the
same null-guard pattern — i.e., replace the fallback [c.parts] with c.parts ?
[c.parts] : [] — so both places consistently treat null/undefined parts and
avoid creating arrays containing null/undefined before filtering.
- Around line 830-846: There is a duplicate, unprotected API call: the initial
UrlFetchApp.fetch + JSON.parse for variables response/result should be removed
so only the fetch inside the try/catch runs; keep the endpoint and options
usage, parse the response inside the try block (using the existing result
variable), and ensure description is assigned from result.candidates/... or
result.parts/... as currently written; also remove the redundant const
declarations outside the try and avoid shadowing response/result so the
Logger.log in the catch will handle failures.
- Around line 844-846: The catch block that currently calls Logger.log in the
Gemini Vision preprocessing code should be changed to use console.warn to match
the project's logging conventions; locate the catch handling for "Image analysis
failed during Gemini Vision preprocessing" (where Logger.log is called) and
replace the Logger.log call with console.warn(`[GenAIApp] - Image analysis
failed during Gemini Vision preprocessing: ${error}`) so warnings use
console.warn consistently with other parts of the codebase.
- Around line 2382-2397: The object literal is missing a comma after the closing
brace of setPrivateInstanceBaseUrl which breaks parsing; add a trailing comma
immediately after the brace that ends setPrivateInstanceBaseUrl to separate it
from the next property (setPromptForVision) and ensure the object’s properties
are properly comma-separated.
No description provided.