feat(plugins-google): add cached_content option for explicit context caching by kamil-bidus · Pull Request #5661 · livekit/agents

kamil-bidus · 2026-05-06T20:03:14Z

Motivation

The Gemini plugin's LLM class supports many GenerateContentConfig options (thinking_config, retrieval_config, safety_settings, etc.) but not cached_content. The plugin already reads cached_content_token_count from response usage in LLMStream._parse_part, so cache hits surface in metrics — there's just no way to attach a CachedContent resource to outgoing requests.

For voice-agent workloads on Gemini 3 Flash with ~6 KB system prompts, implicit caching is unreliable: in a 100-call/day deployment only ~3% of turn-1 requests pick up cached tokens despite firing a same-prefix warmup before the user's first utterance. This matches the broader user reports in #2359 ("Gemini Implicit Caching is still broken - I tested through the gemini API"). Explicit context caching is the documented alternative: create a CachedContent once, reference it by name on every generateContent call. Prefix tokens are processed in under 100 ms and billed at a discount.

Change

Add cached_content: NotGivenOr[str] = NOT_GIVEN to LLM.__init__. Standard propagation pattern:

_LLMOptions dataclass field
__init__ parameter and docstring
Pass-through to _LLMOptions(...)
is_given(...) check in chat() propagating into extra["cached_content"]
Reaches GenerateContentConfig via **self._extra_kwargs

End-to-end usability — request-side suppression

Gemini's API rejects generateContent requests that pass cached_content together with system_instruction, tools, or tool_config — those fields belong inside the CachedContent resource. The exact server response on the conflict:

"CachedContent can not be used with GenerateContent request setting system_instruction, tools or tool_config. Proposed fix: move those values to CachedContent from GenerateContent request."

Without handling that, exposing the parameter would still 400 on any realistic agent — anyone with a system prompt or function tools (i.e. the plugin's primary user base) couldn't actually use the new option. So LLMStream._run now strips `system_instruction`, `tools`, and `tool_config` from the outgoing request whenever `cached_content` is attached. Behaviour is unchanged for callers that don't set `cached_content`: gating is strictly `is-given` on that one option.

Cache lifecycle (creation via `client.caches.create(...)`, TTL refresh, deletion) and the choice of what to bake into the cache stay the application's responsibility. The docstring spells out the contract: the cache resource must contain whichever of `system_instruction` / `tools` the model needs, since the plugin will keep them off the request.

Compatibility

Default `NOT_GIVEN` keeps existing behavior unchanged. Verified by `test_cached_content_omitted_when_not_set` and `test_request_includes_system_instruction_and_tools_when_no_cache`: when the parameter isn't passed, the field is absent from `_extra_kwargs` and the outgoing request still carries `system_instruction` and `tools` exactly as before.

Works with both Gemini Developer API (`cachedContents/{id}`) and Vertex AI (`projects/{p}/locations/{l}/cachedContents/{id}`). The plugin passes the string through unmodified; format validation is the SDK's responsibility.

Tests

`tests/test_plugin_google_llm.py` — 6 cases covering both halves:

Propagation (3):

`test_cached_content_propagates_to_extra_kwargs` — set on init, observed in stream `_extra_kwargs`
`test_cached_content_omitted_when_not_set` — default `NOT_GIVEN` produces no key
`test_cached_content_stored_on_opts` — `_LLMOptions.cached_content` round-trips

Request-side suppression (3) — patch `client.aio.models.generate_content_stream`, capture the `GenerateContentConfig` actually received:

`test_request_omits_system_instruction_when_cached_content_set` — config arrives with `system_instruction=None` even though the chat context contains a system message
`test_request_omits_tools_when_cached_content_set` — `tools` and `tool_config` absent even when the stream is constructed with a function tool
`test_request_includes_system_instruction_and_tools_when_no_cache` — backward-compat: without the option, both fields propagate as before

All existing google-plugin tests still pass. `ruff check` / `ruff format` clean.

Refs

Issue Feature Request: Add Explicit Context Caching Support for Gemini Models #2359 (closed without implementation, comments still asking for this)
https://ai.google.dev/gemini-api/docs/caching

CLAassistant · 2026-05-06T20:03:24Z

All committers have signed the CLA.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

…caching The plugin currently relies on Gemini's implicit cache, which is heuristic. In voice-agent workloads where the system prompt is large and stable across calls, implicit caching often misses on turn 1 of a conversation, paying the full cold-start cost. Explicit caching is the documented alternative: the application creates a CachedContent resource via client.caches.create(...) and references it by name on subsequent generateContent calls. Cached prefix tokens are billed at a discount and processed in under 100ms. The plugin already reads cached_content_token_count from response usage but had no way to set cached_content on requests. This adds the parameter on LLM.__init__, stores it on _LLMOptions, and propagates it into GenerateContentConfig via extra_kwargs. End-to-end usability matters: Gemini rejects generateContent requests that pass cached_content together with system_instruction, tools, or tool_config — those fields belong inside the CachedContent resource. Without handling that, setting cached_content on any LLM that also has a system prompt or function tools would 400. So LLMStream._run now suppresses system_instruction, tools, and tool_config from the outgoing request whenever cached_content is attached. Cache lifecycle (creation, TTL refresh, deletion) and the choice of what to bake into the cache stay the application's responsibility — the plugin only consumes the resource name and ensures the matching fields are absent from the request. Behaviour is unchanged for callers that don't pass cached_content: the gating is strictly is-given on that one option. Documented on the docstring so users know the cache must contain whichever of system_instruction / tools the model needs. Tests cover propagation, the omitted-when-not-set default, and the three suppression branches (system_instruction stripped, tools stripped, tool_config stripped) plus the unchanged-when-no-cache backward-compat path. Refs livekit#2359.

kamil-bidus · 2026-05-07T15:12:41Z

Superseded by #5675.

devin-ai-integration Bot reviewed May 6, 2026

View reviewed changes

kamil-bidus marked this pull request as draft May 7, 2026 14:35

kamil-bidus force-pushed the kamdibus/gemini-cached-content-support branch from 853f638 to c57dd80 Compare May 7, 2026 15:06

kamil-bidus closed this May 7, 2026

kamil-bidus deleted the kamdibus/gemini-cached-content-support branch May 7, 2026 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugins-google): add cached_content option for explicit context caching#5661

feat(plugins-google): add cached_content option for explicit context caching#5661
kamil-bidus wants to merge 1 commit intolivekit:mainfrom
kamil-bidus:kamdibus/gemini-cached-content-support

kamil-bidus commented May 6, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 6, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

kamil-bidus commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kamil-bidus commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Change

End-to-end usability — request-side suppression

Compatibility

Tests

Refs

Uh oh!

CLAassistant commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

kamil-bidus commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kamil-bidus commented May 6, 2026 •

edited

Loading

CLAassistant commented May 6, 2026 •

edited

Loading