fix: add initialize_kwargs for embedder API authentication #457

noluyorAbi · 2026-01-18T17:01:07Z

Problem

This PR addresses the "No valid embeddings found in any documents" error reported in issue #252 and related issues (#198, #266, #334).

The root cause is that the embedder configuration in api/config/embedder.json was missing the initialize_kwargs section, which is required to pass API credentials to the embedder client. Without this, the embedder cannot authenticate with OpenAI-compatible APIs or Google APIs, resulting in failed embedding requests.

Additionally, the dimensions field in model_kwargs causes errors with embedding models that do not support matryoshka representation (e.g., bge, Qwen).

Changes

Added initialize_kwargs to the OpenAI embedder configuration
- Includes api_key and base_url placeholders that are substituted from environment variables
- Enables authentication with OpenAI-compatible embedding APIs
Removed the dimensions field from model_kwargs
- Prevents errors with models that do not support matryoshka representation
Added initialize_kwargs to the Google embedder configuration
- Includes api_key placeholder for Google API authentication

Configuration

Users need to set the following environment variables:

OPENAI_API_KEY - API key for OpenAI-compatible embedding models
OPENAI_BASE_URL - Base URL for OpenAI-compatible API endpoints
GOOGLE_API_KEY - (Optional) API key for Google embedding models

The ${ENV_VAR} placeholders in the configuration are automatically replaced with environment variable values at runtime.

Related Issues

gemini-code-assist · 2026-01-18T17:01:19Z

Summary of Changes

Hello @noluyorAbi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical issues related to embedding generation by updating the embedder.json configuration. It introduces necessary authentication parameters for both OpenAI-compatible and Google embedding services, ensuring that API keys and base URLs are correctly utilized. Additionally, it addresses model compatibility by removing a problematic dimensions field, preventing errors with models that do not support specific representations. These changes collectively aim to fix 'No valid embeddings found' errors and improve the robustness of the embedding system.

Highlights

OpenAI Embedder Configuration Update: Added initialize_kwargs to the OpenAI embedder configuration to allow passing api_key and base_url from environment variables, enabling proper authentication for OpenAI-compatible embedding APIs.
Model Compatibility Fix: Removed the dimensions field from model_kwargs in the OpenAI embedder configuration to prevent errors with embedding models that do not support matryoshka representation.
Google Embedder Configuration Update: Added initialize_kwargs to the Google embedder configuration to allow passing api_key from environment variables, enabling proper authentication for Google embedding APIs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix an authentication issue with embedders by adding initialize_kwargs to the configuration. While the intent is correct and the removal of the dimensions parameter is a good improvement, the current implementation of initialize_kwargs introduces a critical issue. If environment variables are not set, their placeholder strings are passed as literal credentials, which bypasses the clients' more robust error handling and default value logic. This can lead to non-obvious runtime failures. My review comments detail this problem and recommend removing the new initialize_kwargs sections, as the client classes are already designed to handle credential loading from the environment.

gemini-code-assist · 2026-01-18T17:03:02Z

api/config/embedder.json

+    "initialize_kwargs": {
+      "api_key": "${OPENAI_API_KEY}",
+      "base_url": "${OPENAI_BASE_URL}"
+    },


This change introduces a potential issue when environment variables are not set. The replace_env_placeholders function will pass the literal placeholder string (e.g., "${OPENAI_BASE_URL}") to the OpenAIClient constructor if the corresponding environment variable is missing.

This causes two problems:

For base_url, it bypasses the client's logic to use the default OpenAI URL (https://api.openai.com/v1) when OPENAI_BASE_URL is not set. The client will instead try to connect to the invalid URL "${OPENAI_BASE_URL}".

For api_key, it bypasses the client's validation that raises a ValueError if the API key is missing. Instead, it will attempt to authenticate with the invalid key "${OPENAI_API_KEY}".

The OpenAIClient is already designed to read api_key and base_url from environment variables if they are not passed to the constructor. Relying on that existing mechanism is more robust.

While the removal of the dimensions field is a good improvement, I recommend removing the initialize_kwargs section entirely. The client will correctly handle credentials on its own. If the original issue persists, it might be due to the environment variables not being correctly propagated to the application, which should be investigated.

The OpenAIClient is already designed to read api_key and base_url from environment variables if they are not passed to the constructor. Relying on that existing mechanism is more robust.

I am curious what issues are you seeing if it is not defined directly in the kwargs?

gemini-code-assist · 2026-01-18T17:03:02Z

api/config/embedder.json

+    "initialize_kwargs": {
+      "api_key": "${GOOGLE_API_KEY}"
+    },


Similar to the OpenAI configuration, this change can cause issues if the GOOGLE_API_KEY environment variable is not set. The GoogleEmbedderClient will be initialized with api_key as the literal string "${GOOGLE_API_KEY}".

This bypasses the client's logic to raise a ValueError for a missing key, and will instead cause a failure later when genai.configure() is called with an invalid key. The PR description also mentions this key is optional, which makes this behavior particularly problematic.

The GoogleEmbedderClient already handles reading the API key from the environment. It's better to rely on the client's implementation.

I recommend removing this initialize_kwargs section to allow the client to manage its own credential loading.

devin-ai-integration

Devin Review found 1 potential issue.

View issue and 4 additional flags in Devin Review.

devin-ai-integration · 2026-01-23T20:57:46Z

api/config/embedder.json

    "model_kwargs": {
      "model": "text-embedding-3-small",
-      "dimensions": 256,
      "encoding_format": "float"
    }


🔴 Removal of dimensions parameter causes embedding dimension mismatch with cached databases

Removing the dimensions: 256 parameter from the OpenAI embedder configuration causes a breaking change for users with existing cached databases.

Click to expand

Background

The text-embedding-3-small model defaults to 1536 dimensions when no dimensions parameter is specified. The previous configuration explicitly set dimensions: 256 to reduce embedding size.

How the bug is triggered

User has an existing cached database (.pkl file) at ~/.adalflow/databases/{repo_name}.pkl with 256-dimensional embeddings (created before this change)

User updates to the new configuration (without dimensions parameter)

System loads the cached 256-dimensional embeddings from api/data_pipeline.py:869-892

When querying, the system generates a 1536-dimensional query embedding using the new configuration

FAISS retriever fails because query embedding dimension (1536) doesn't match document embedding dimension (256)

Code flow

api/data_pipeline.py:869-892 loads existing databases without checking embedding dimension compatibility:

if self.repo_paths and os.path.exists(self.repo_paths["save_db_file"]): logger.info("Loading existing database...") self.db = LocalDB.load_state(self.repo_paths["save_db_file"]) documents = self.db.get_transformed_data(key="split_and_embed") if documents: # ... logs dimensions but doesn't validate against current config return documents # Returns old embeddings

api/rag.py:385-390 creates FAISS retriever with mismatched dimensions:

self.retriever = FAISSRetriever( **configs["retriever"], embedder=retrieve_embedder, # Uses new 1536-dim embedder documents=self.transformed_docs, # Contains old 256-dim embeddings document_map_func=lambda doc: doc.vector, )

Impact

Runtime errors when querying repositories that have cached databases

Error message: "All embeddings should be of the same size" or similar FAISS dimension mismatch error

Users must manually delete cached databases to recover

Recommendation: Either: (1) Keep the dimensions: 256 parameter to maintain backward compatibility, or (2) Add dimension validation in api/data_pipeline.py to detect and rebuild databases when embedding dimensions don't match the current configuration.

Was this helpful? React with 👍 or 👎 to provide feedback.

fix: add initialize_kwargs for embedder API authentication

51a7ccf

gemini-code-assist bot reviewed Jan 18, 2026

View reviewed changes

devin-ai-integration bot reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add initialize_kwargs for embedder API authentication #457

fix: add initialize_kwargs for embedder API authentication #457

noluyorAbi commented Jan 18, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

gemini-code-assist bot commented Jan 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 18, 2026

Uh oh!

sng-asyncfunc Jan 18, 2026

Uh oh!

gemini-code-assist bot Jan 18, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: add initialize_kwargs for embedder API authentication #457

Are you sure you want to change the base?

fix: add initialize_kwargs for embedder API authentication #457

Conversation

noluyorAbi commented Jan 18, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Configuration

Related Issues

Uh oh!

gemini-code-assist bot commented Jan 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

sng-asyncfunc Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Jan 23, 2026

Choose a reason for hiding this comment

Background

How the bug is triggered

Code flow

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noluyorAbi commented Jan 18, 2026 •

edited by devin-ai-integration bot

Loading