Skip to content

feat: allow disabling prompt caching per model via a0_explicit_caching kwarg#1654

Open
akshay-sood wants to merge 1 commit into
agent0ai:mainfrom
akshay-sood:feat/disable-prompt-caching-per-model
Open

feat: allow disabling prompt caching per model via a0_explicit_caching kwarg#1654
akshay-sood wants to merge 1 commit into
agent0ai:mainfrom
akshay-sood:feat/disable-prompt-caching-per-model

Conversation

@akshay-sood
Copy link
Copy Markdown

Problem

Models that don't support prompt caching (e.g. NVIDIA Nemotron on AWS Bedrock) fail with 403 Forbidden errors because Agent Zero unconditionally adds cache_control: {type: 'ephemeral'} markers to messages via explicit_caching=True in call_chat_model().

The Bedrock error message is:

"You invoked an unsupported model or your request did not allow prompt caching."

Solution

Add support for a new model kwarg a0_explicit_caching that, when set to false, disables prompt caching for that specific model. This allows users to configure it in their preset's Additional Settings:

a0_explicit_caching=false

Or in presets.yaml:

kwargs:
  a0_explicit_caching: false

Implementation

  • Read a0_explicit_caching from model kwargs before _convert_messages() is called (critical ordering — must happen before cache_control markers are injected)
  • If the value is False, override explicit_caching to False for that call
  • Strip the kwarg from call_kwargs before passing to LiteLLM

Example Use Case

NVIDIA Nemotron on Bedrock preset:

kwargs:
  aws_region_name: us-east-2
  fake_stream: true
  a0_explicit_caching: false

This follows the same pattern as existing A0-specific kwargs (a0_retry_attempts, a0_retry_delay_seconds) that are stripped before reaching LiteLLM.

Testing

  • Verified NVIDIA Nemotron (nvidia.nemotron-super-3-120b) responds successfully with this fix
  • Confirmed no impact on Claude models (which continue to use prompt caching by default)
  • The fix is backwards-compatible: existing presets without this kwarg behave identically

…g kwarg

Models that don't support prompt caching (e.g. NVIDIA Nemotron on Bedrock)
fail with 403 errors when cache_control headers are present in messages.

This adds support for a new model kwarg `a0_explicit_caching: false` that
can be set in preset additional settings to disable prompt caching for
specific models.

The check is placed before _convert_messages() so cache_control markers
are never injected into the message payload.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant