Skip to content

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents. #4323

@enesdemirag

Description

@enesdemirag

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine


Issue Description

Describe the Bug:
When triggering an ADK agent multiple times in quick succession, the request fails with a streaming error that ultimately resolves to a 429 RESOURCE_EXHAUSTED error from Vertex AI. The error is surfaced by ADK as a 500 during response streaming.

Observed error:

{
  "error": "500: An error occurred while streaming the response: 429 Too Many Requests." 
  "details": {
    "message": "Resource exhausted. Please try again later.",
    "status": "RESOURCE_EXHAUSTED"
  }
}

The error message points to ADK and Vertex AI 429 documentation but it’s unclear where the actual bottleneck is and how it should be handled when using ADK in production.


Steps to Reproduce:

  1. Deploy a backend service on Google Cloud Run.
  2. Use Google ADK with Vertex AI Reasoning Engine (us-central1).
  3. Trigger the same agent multiple times in a short time window (≈ concurrent or burst traffic).
  4. Observe 429 RESOURCE_EXHAUSTED errors surfaced as streaming failures.

Expected Behavior:
Requests may slow down or queue, but should not fail with a hard error during streaming. Ideally, retries or backoff would be handled gracefully.


Observed Behavior:
Requests fail with 429 RESOURCE_EXHAUSTED, wrapped as a 500 streaming error by ADK.


Environment Details:

  • ADK Library Version: latest
  • Python Version: 3.12

Model Information:

  • Are you using LiteLLM: No
  • Models used: gemini-2.5-pro, gemini-2.5-flash
  • Gemini tier: Tier 2
  • Reasoning Engine location: us-central1

❓ Questions / Clarification Needed

  1. Is this error strictly caused by the Vertex AI quota below?

    Query Reasoning Engine requests per minute per region = 30
    
  2. Will increasing this quota fully resolve the issue, or are there additional ADK-level or Reasoning Engine concurrency limits/bottlenecks?

  3. Does ADK provide any built-in retry, backoff, or queueing mechanism for 429 RESOURCE_EXHAUSTED errors?

  4. Are there recommended production patterns when using ADK + Reasoning Engine behind Cloud Run?

  5. Is it possible to self host the reasoning engine locally inside my server and use ADK, so that I only need to worry about the Gemini LLM Request quotas?

I’m planning to launch this service soon and want to ensure the setup is production-safe under burst traffic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent engine[Component] This issue is related to Vertex AI Agent Engine

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions