feat: allow disabling prompt caching per model via a0_explicit_caching kwarg by akshay-sood · Pull Request #1654 · agent0ai/agent-zero

akshay-sood · 2026-05-19T15:36:07Z

Problem

Models that don't support prompt caching (e.g. NVIDIA Nemotron on AWS Bedrock) fail with 403 Forbidden errors because Agent Zero unconditionally adds cache_control: {type: 'ephemeral'} markers to messages via explicit_caching=True in call_chat_model().

The Bedrock error message is:

"You invoked an unsupported model or your request did not allow prompt caching."

Solution

Add support for a new model kwarg a0_explicit_caching that, when set to false, disables prompt caching for that specific model. This allows users to configure it in their preset's Additional Settings:

a0_explicit_caching=false

Or in presets.yaml:

kwargs:
  a0_explicit_caching: false

Implementation

Read a0_explicit_caching from model kwargs before _convert_messages() is called (critical ordering — must happen before cache_control markers are injected)
If the value is False, override explicit_caching to False for that call
Strip the kwarg from call_kwargs before passing to LiteLLM

Example Use Case

NVIDIA Nemotron on Bedrock preset:

kwargs:
  aws_region_name: us-east-2
  fake_stream: true
  a0_explicit_caching: false

This follows the same pattern as existing A0-specific kwargs (a0_retry_attempts, a0_retry_delay_seconds) that are stripped before reaching LiteLLM.

Testing

Verified NVIDIA Nemotron (nvidia.nemotron-super-3-120b) responds successfully with this fix
Confirmed no impact on Claude models (which continue to use prompt caching by default)
The fix is backwards-compatible: existing presets without this kwarg behave identically

…g kwarg Models that don't support prompt caching (e.g. NVIDIA Nemotron on Bedrock) fail with 403 errors when cache_control headers are present in messages. This adds support for a new model kwarg `a0_explicit_caching: false` that can be set in preset additional settings to disable prompt caching for specific models. The check is placed before _convert_messages() so cache_control markers are never injected into the message payload.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: allow disabling prompt caching per model via a0_explicit_caching kwarg#1654

feat: allow disabling prompt caching per model via a0_explicit_caching kwarg#1654
akshay-sood wants to merge 1 commit into
agent0ai:mainfrom
akshay-sood:feat/disable-prompt-caching-per-model

akshay-sood commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

akshay-sood commented May 19, 2026

Problem

Solution

Implementation

Example Use Case

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant