-
Notifications
You must be signed in to change notification settings - Fork 101
Design: Gemini prompt context caching (Vertex + AI Studio) and example probe (closes #1427) #1434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ble probe example for implicit and explicit caching - Verified docs for implicit/explicit caching, models incl. Gemini 3 - Example to test TTL and cached token counts; logs to logs/caching Co-authored-by: openhands <openhands@all-hands.dev>
|
[Automatic Post]: I have assigned @csmith49 as a reviewer based on git blame information. Thanks in advance for the help! |
|
@OpenHands understand this PR. Run the example script. If it fails, try to fix it and run it, so we can see if it proves what it aims. you have both a gemini key in your env, for direct calls, and a litellm proxy key (the second works on base url https://llm-proxy.eval.all-hands.dev). The proxy is proxying to Vertex AI. Use model gemini-3-pro-preview (maybe do /v1/models on proxy to see exact name) Post a comment here with what was displayed in console, so I can see the results. |
|
I'm on it! enyst can track my progress at all-hands.dev |
…fix LiteLLM provider selection by passing custom_llm_provider through LLM\n\n- Switch default implicit model to gemini-3-pro-preview (per /v1/models)\n- Add custom_llm_provider to LLM.transport call, so example can override to 'openai' when hitting the LiteLLM proxy base_url\n- Verified implicit (proxy) and explicit (AI Studio) flows run and log usage\n\nCo-authored-by: openhands <openhands@all-hands.dev>
|
Summary of work on PR branch openhands/gemini-caching-design What I did
Changes made (minimal and focused)
Verification output (console)
Explicit mode (AI Studio)
Checklist against request
Commit pushed
Notes and suggested next step
|
… Greatly enlarge stable system prefix and first user message to exceed ~2,048 tokens\n- Add a third call and 2s sleeps to increase hit likelihood\n- Verified logs show usage_summary.cache_read_tokens and cached_tokens in vertex metadata\n\nCo-authored-by: openhands <openhands@all-hands.dev>
What I changed (since last summary)
Verification: implicit and explicit
Implicit caching - Output:
Logs (under logs/caching/) confirm cache hits:
- Explicit (AI Studio via google-genai) Explicit caching - Output:
Checklist against your request
Next steps (optional)
|
|
[Automatic Post]: It has been a while since there was any activity on this PR. @enyst, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
This was an OpenHands agent thread to test implicit and explicit caching. Nothing to do here. |
Summary
This PR adds a design document and a runnable example to support prompt/context caching for Gemini models on Vertex AI and Gemini API (AI Studio). It focuses on options beyond automatic implicit caching and provides verified guidance for Gemini 2.x and Gemini 3.x.
Changes
Motivation / Background
Implementation Notes
Verification
Checklist
Closes #1427
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2ed64ca-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2ed64ca-python) is a multi-arch manifest supporting both amd64 and arm642ed64ca-python-amd64) are also available if needed