Portkey-AI · krishnachandra1709 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/product/ai-gateway/cache-simple-and-semantic.mdx b/product/ai-gateway/cache-simple-and-semantic.mdx
@@ -4,16 +4,20 @@ description: Speed up requests and reduce costs by caching LLM responses.
 ---
 
 <Info>
-**Simple** caching is available for all plans.<br />
-**Semantic** caching requires a vector database and is only available on select Enterprise plans. [Contact us](https://portkey.ai/docs/support/contact-us) to learn more about enabling this feature.
+  **Simple** caching is available for all plans.
+  <br />
+  **Semantic** caching requires a vector database and is only available on
+  select Enterprise plans. [Contact
+  us](https://portkey.ai/docs/support/contact-us) to learn more about enabling
+  this feature.
 </Info>
 
 Cache LLM responses to serve requests up to **20x faster** and cheaper.
 
-| Mode | How it Works | Best For | Supported Routes |
-|------|--------------|----------|------------------|
-| **Simple** | Exact match on input | Repeated identical prompts | All models including image generation |
-| **Semantic** | Matches semantically similar requests | Denoising variations in phrasing | `/chat/completions`, `/completions` |
+| Mode         | How it Works                          | Best For                         | Supported Routes                      |
+| ------------ | ------------------------------------- | -------------------------------- | ------------------------------------- |
+| **Simple**   | Exact match on input                  | Repeated identical prompts       | All models including image generation |
+| **Semantic** | Matches semantically similar requests | Denoising variations in phrasing | `/chat/completions`, `/completions`   |
 
 ## Enable Cache
 
@@ -36,25 +40,63 @@ Add `cache` to your [config object](/api-reference/config-object#cache-object-de
 </CodeGroup>
 
 <Note>
-Caching won't work if `x-portkey-debug: "false"` header is included.
+  Caching won't work if `x-portkey-debug: "false"` header is included.
 </Note>
 
 ## Simple Cache
 
-Exact match on input prompts. If the same request comes again, Portkey returns the cached response.
+Returns the cached response when the **exact same request** is sent again.
+
+**Hit when all of these match a cached entry:**
+
+- Full request body (`messages` or `prompt`, `model`, `temperature`, `max_tokens`, and every other parameter)
+- `x-portkey-metadata` (if used)
+- `x-portkey-cache-namespace` (if used)
+- The entry is still within `max_age`
+
+**Miss when:**
+
+- The request is sent for the first time
+- Any field in the body changes—even one character in the prompt or a different parameter
+- The cached entry has expired
+- `x-portkey-cache-force-refresh: true` is set on the request
 
 ## Semantic Cache
 
-Matches requests with similar meaning using cosine similarity. [Learn more →](https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/)
+Matches requests with **similar meaning**, not just identical text. [Learn more →](https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/)
 
 <Info>
-Semantic cache is a superset—it handles simple cache hits too.
+  Semantic cache is a superset of simple cache. Portkey checks for an exact
+  match first and only runs semantic search on a miss.
 </Info>
 
 <Note>
-Semantic cache works with requests under 8,191 tokens and ≤4 messages.
+  Semantic cache works with requests under 8,191 tokens and ≤4 messages.
 </Note>
 
+**Hit when all of these are true (after a simple-cache miss):**
+
+- User text has **similar meaning** to a cached request — cosine similarity above the threshold (default `0.95`)
+- `model`, `temperature`, `max_tokens`, and any other body parameter **match exactly**
+- `x-portkey-metadata` matches exactly (if used)
+- The chat has at least one non-system message — the **first message (usually `system`) is ignored** during matching, so changing it does not affect cache hits
+
+**Example — same model, different wording → semantic hit:**
+
+```json
+// First request (cached)
+{
+  "model": "gpt-4o",
+  "messages": [{ "role": "user", "content": "Who is the US president?" }]
+}
+
+// Second request — SEMANTIC HIT
+{
+  "model": "gpt-4o",
+  "messages": [{ "role": "user", "content": "Tell me who is the president of the US." }]
+}
+```
+
 ### Set up semantic caching (self-hosted)
 
 To enable semantic caching on a self-hosted Portkey gateway, configure the embedding provider and a vector database.
@@ -73,6 +115,7 @@ To enable semantic caching on a self-hosted Portkey gateway, configure the embed
     ```
 
     `SEMANTIC_CACHE_EMBEDDING_PROVIDER` accepts `openai`, `google` (Gemini embeddings), or `vertex-ai` (Vertex AI embeddings). Set `SEMANTIC_CACHE_EMBEDDINGS_URL`, `SEMANTIC_CACHE_EMBEDDING_MODEL`, and `SEMANTIC_CACHE_EMBEDDING_DIMENSIONS` to match the chosen provider's embedding model.
+
   </Step>
   <Step title="Configure the vector database">
     Set the following environment variables in your gateway environment to connect to your vector store (Milvus or Pinecone):
@@ -110,29 +153,18 @@ To enable semantic caching on a self-hosted Portkey gateway, configure the embed
     ```json
     { "cache": { "mode": "semantic" } }
     ```
+
   </Step>
 </Steps>
 
 <Warning>
-  **Limitations:**
-  - Embedding generation supports OpenAI, Google (Gemini), and Vertex AI embedding providers.
-  - The LLM model used for generating responses must be OpenAI-compatible.
-  - Each request must include at least one `user` message along with system messages. Requests with only system messages are dropped.
+  **Limitations:** - Embedding generation supports OpenAI, Google (Gemini), and
+  Vertex AI embedding providers. - The LLM model used for generating responses
+  must be OpenAI-compatible. - Each request must include at least one `user`
+  message along with system messages. Requests with only system messages are
+  dropped.
 </Warning>
 
-### Message matching behavior
-
-Semantic cache requires **at least two messages**. The first message (typically `system`) is ignored for matching:
-
-```json
-[
-  { "role": "system", "content": "You are a helpful assistant" },
-  { "role": "user", "content": "Who is the president of the US?" }
-]
-```
-
-Only the `user` message is used for matching. Change the system message without affecting cache hits.
-
 ## Cache TTL
 
 Set expiration with `max_age` (in seconds):
@@ -141,11 +173,11 @@ Set expiration with `max_age` (in seconds):
 { "cache": { "mode": "semantic", "max_age": 60 } }
 ```
 
-| Setting | Value |
-|---------|-------|
-| Minimum | 60 seconds |
+| Setting | Value                       |
+| ------- | --------------------------- |
+| Minimum | 60 seconds                  |
 | Maximum | 90 days (7,776,000 seconds) |
-| Default | 7 days (604,800 seconds) |
+| Default | 7 days (604,800 seconds)    |
 
 ### Organization-Level TTL
 
@@ -156,6 +188,7 @@ Admins can set default TTL for all workspaces to align with data retention polic
 3. Save
 
 **Precedence:**
+
 - No `max_age` in request → org default used
 - Request `max_age` > org default → org default wins
 - Request `max_age` < org default → request value honored
@@ -178,12 +211,15 @@ response = portkey.with_options(
 ```
 
 ```javascript Node
-const response = await portkey.chat.completions.create({
-    messages: [{ role: 'user', content: 'Hello' }],
-    model: '@openai-prod/gpt-4o',
-}, {
-    cacheForceRefresh: true
-});
+const response = await portkey.chat.completions.create(
+  {
+    messages: [{ role: "user", content: "Hello" }],
+    model: "@openai-prod/gpt-4o",
+  },
+  {
+    cacheForceRefresh: true,
+  },
+);
 ```
 
 ```bash cURL
@@ -197,8 +233,8 @@ curl https://api.portkey.ai/v1/chat/completions \
 </CodeGroup>
 
 <Info>
-- Requires cache config to be passed
-- For semantic hits, refreshes ALL matching entries
+  - Requires cache config to be passed - For semantic hits, refreshes ALL
+  matching entries
 </Info>
 
 ## Cache Namespace
@@ -217,12 +253,15 @@ response = portkey.with_options(
 ```
 
 ```javascript Node
-const response = await portkey.chat.completions.create({
-    messages: [{ role: 'user', content: 'Hello' }],
-    model: '@openai-prod/gpt-4o',
-}, {
-    cacheNamespace: 'user-123'
-});
+const response = await portkey.chat.completions.create(
+  {
+    messages: [{ role: "user", content: "Hello" }],
+    model: "@openai-prod/gpt-4o",
+  },
+  {
+    cacheNamespace: "user-123",
+  },
+);
 ```
 
 ```bash cURL
@@ -247,7 +286,11 @@ Set cache at top-level or per-target:
   "strategy": { "mode": "fallback" },
   "targets": [
     { "override_params": { "model": "@openai-prod/gpt-4o" } },
-    { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" } }
+    {
+      "override_params": {
+        "model": "@anthropic-prod/claude-3-5-sonnet-20241022"
+      }
+    }
   ]
 }
 ```
@@ -256,31 +299,39 @@ Set cache at top-level or per-target:
 {
   "strategy": { "mode": "fallback" },
   "targets": [
-    { "override_params": { "model": "@openai-prod/gpt-4o" }, "cache": { "mode": "simple", "max_age": 200 } },
-    { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "cache": { "mode": "semantic", "max_age": 100 } }
+    {
+      "override_params": { "model": "@openai-prod/gpt-4o" },
+      "cache": { "mode": "simple", "max_age": 200 }
+    },
+    {
+      "override_params": {
+        "model": "@anthropic-prod/claude-3-5-sonnet-20241022"
+      },
+      "cache": { "mode": "semantic", "max_age": 100 }
+    }
   ]
 }
 ```
 
 </CodeGroup>
 
-<Info>
-Target-level cache takes precedence over top-level.
-</Info>
+<Info>Target-level cache takes precedence over top-level.</Info>
 
 <Note>
-Targets with `override_params` need that exact param combination cached before hits occur.
+  Targets with `override_params` need that exact param combination cached before
+  hits occur.
 </Note>
 
 ## Analytics & Logs
 
 **Analytics** → Cache tab shows:
+
 - Cache hit rate
 - Latency savings
 - Cost savings
 
 **Logs** → Status column shows: `Cache Hit`, `Cache Semantic Hit`, `Cache Miss`, `Cache Refreshed`, or `Cache Disabled`. [Learn more →](/product/observability/logs)
 
 <Frame>
-  <img src="/images/product/ai-gateway/ai-11.png"/>
+  <img src="/images/product/ai-gateway/ai-11.png" />
 </Frame>