Skip to content

Conversation

@NaveenSandaruwan
Copy link

@NaveenSandaruwan NaveenSandaruwan commented Jan 27, 2026

Purpose

This caching mechanism need to store embedding of tool descriptions in semantic tool filtering policy.

Goals

Using this it can be improve the performance and embedding model usage. It will leads to reduce latency.

Approach

Double hash map is utilised. Each APIID has an Hashmap to store their embedding and it is retrieved using APIID. Then inside that use description hash to get particular embedding for tool description.

Summary by CodeRabbit

  • New Features

    • Added configuration for embedding provider (provider, API key, model, and endpoint) to support external embedding services.
  • Chores

    • Added a global, thread-safe embedding cache to improve performance and concurrent embedding lookups across APIs.

✏️ Tip: You can customize this high-level summary in your review settings.

NaveenSandaruwan and others added 6 commits January 27, 2026 12:39
- Introduced the semantic-tool-filtering policy to dynamically filter tools based on semantic relevance to user queries.
- Updated policy manifests and configuration files to include the new policy.
- Implemented an embedding cache to store and manage embeddings for tools, enhancing performance and efficiency.
- Added documentation for the new policy and its configuration parameters.
Add semantic tool filtering policy and embedding cache implementation
Remove semantic tool filtering policy and update embedding cache impl…
Remove semantic tool filtering documentation
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


NaveenSandaruwan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Walkthrough

Adds four embedding-provider config keys to gateway configuration and introduces a new thread-safe, per-API singleton embedding cache (SHA-256 keyed) with operations to get, add, remove, clear entries and report stats, all guarded by RWMutex.

Changes

Cohort / File(s) Summary
Embedding Provider Configuration
gateway/configs/config.yaml
Added top-level keys: embedding_provider, embedding_provider_api_key, embedding_provider_model, embedding_provider_endpoint.
Embedding Cache Implementation
sdk/gateway/policy/v1alpha/embedding_cache.go
New file: defines EmbeddingEntry, APIEmbeddingCache, EmbeddingCacheStore (singleton). Implements SHA-256 HashDescription, per-API caches, thread-safe accessors/mutators (HasAPI, GetAPICache, AddAPICache, GetEntry, GetEntryByDescription, AddEntry, AddEntryByDescription, RemoveAPI, ClearAll, GetCacheStats). All operations use RWMutex.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Cache as EmbeddingCacheStore
  participant Storage as APIEmbeddingCache

  Client->>Cache: GetEntryByDescription(apiId, description)
  Cache->>Cache: HashDescription(description)
  Cache->>Storage: lookup by hashKey (read lock)
  Storage-->>Cache: entry or nil
  Cache-->>Client: return entry or nil

  Client->>Cache: AddEntryByDescription(apiId, description, name, embedding)
  Cache->>Cache: HashDescription(description)
  Cache->>Cache: ensure API cache exists (write lock)
  Cache->>Storage: remove existing by name, insert entry
  Cache-->>Client: ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble hashes in the night,
Storing vectors snug and tight,
Mutex hugs keep races at bay,
Per-API burrows where they stay,
A hop, a stash, embeddings light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description includes Purpose, Goals, and Approach sections as per the template, but is missing several required sections like Automation tests, Security checks, Documentation, and Test environment. Add missing template sections: Documentation, Automation tests (with coverage details), Security checks (with yes/no confirmations), and Test environment details.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing a cache for embeddings in the SDK for semantic tool filtering policy.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@sdk/gateway/policy/v1alpha/embedding_cache.go`:
- Around line 56-66: GetAPICache currently returns a direct reference to the
internal APIEmbeddingCache map which can be mutated concurrently by AddEntry or
RemoveAPI after the RLock is released; change EmbeddingCacheStore.GetAPICache to
build and return a defensive shallow copy of the APIEmbeddingCache (iterate over
ecs.cache[apiId] while holding the RLock and copy key/value pairs into a new
map) so callers can safely iterate/use the returned map without needing external
synchronization; alternatively, if you prefer iteration-with-lock semantics, add
a new method that accepts a callback to be executed under the lock, but do not
return the internal map directly from GetAPICache.
🧹 Nitpick comments (4)
gateway/configs/config.yaml (1)

413-417: Consider adding inline documentation for the new embedding provider configuration.

The placement at root level is correct per the gateway-level infrastructure configuration pattern. However, unlike other configuration sections in this file, these new fields lack inline comments explaining their purpose and valid values.

Based on learnings, this configuration is shared across multiple policies (e.g., semantic-tool-filtering).

📝 Suggested documentation
+# Embedding provider configuration
+# Used by policies that require semantic/embedding capabilities (e.g., semantic-tool-filtering)
+# Supported providers: "OPENAI", "AZURE_OPENAI", "MISTRAL"
 embedding_provider: "MISTRAL"
+# API key for the embedding provider
 embedding_provider_api_key: "<YOUR_MISTRAL_API_KEY>"
+# Embedding model name (not required for AZURE_OPENAI which uses deployment name in endpoint)
 embedding_provider_model: "mistral-embed"
+# Embedding provider API endpoint
 embedding_provider_endpoint: "https://api.mistral.ai/v1/embeddings"
sdk/gateway/policy/v1alpha/embedding_cache.go (3)

79-98: Minor: returned *EmbeddingEntry can be mutated by callers.

GetEntry returns a pointer to the cached EmbeddingEntry. Callers could mutate Name or the Embedding slice contents without synchronization. If callers are trusted to treat this as read-only, this is acceptable. Otherwise, consider returning a copy.


114-120: Performance: O(n) scan per AddEntry for name deduplication.

The loop scans all entries to find and remove an existing entry with the same Name. For APIs with many tools, this becomes expensive. Consider maintaining a secondary index (name → hashKey) if this becomes a bottleneck.

♻️ Optional: Add secondary index for O(1) name lookup
// In EmbeddingCacheStore, add:
type APIEmbeddingCache struct {
    entries   map[string]*EmbeddingEntry // hashKey → entry
    nameIndex map[string]string          // name → hashKey
}

This would allow O(1) lookup and removal by name, but adds complexity. Only pursue if profiling shows this as a bottleneck.


19-23: Consider adding cache size limits or eviction policy.

The cache can grow unbounded as more APIs and tool embeddings are added. Each embedding vector (typically 384-1536 float32s) consumes 1.5-6KB. With many APIs, memory usage could become significant.

Consider:

  • Maximum entries per API or total
  • LRU eviction for least-recently-used entries
  • Metrics/alerts on cache size via GetCacheStats

This may be acceptable if APIs are removed via RemoveAPI when undeployed, but worth monitoring in production.

returning internal map reference.

GetAPICache returns a direct reference to the internal APIEmbeddingCache map. After the RLock is released, callers can access this map without synchronization while concurrent AddEntry or RemoveAPI calls modify it, causing a data race.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
max_export_batch_size: 512
sampling_rate: 1.0

embedding_provider: "MISTRAL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these. They are not needed by default

@@ -0,0 +1,166 @@
package policyv1alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing license header


// Singleton instance for EmbeddingCacheStore
var (
embeddingCacheInstance *EmbeddingCacheStore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only required for a single guardrail policy, we can move this to the policy implementation. So this will be a package level cache in the policy package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants