RESOURCE_EXHAUSTED (429) errors when triggering ADK agents.

### `RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine`

---

### **Issue Description**

**Describe the Bug:**
When triggering an ADK agent multiple times in quick succession, the request fails with a streaming error that ultimately resolves to a `429 RESOURCE_EXHAUSTED` error from Vertex AI. The error is surfaced by ADK as a `500` during response streaming.

**Observed error:**

```json
{
  "error": "500: An error occurred while streaming the response: 429 Too Many Requests." 
  "details": {
    "message": "Resource exhausted. Please try again later.",
    "status": "RESOURCE_EXHAUSTED"
  }
}
```

The error message points to ADK and Vertex AI 429 documentation but it’s unclear where the actual bottleneck is and how it should be handled when using ADK in production.

---

**Steps to Reproduce:**

1. Deploy a backend service on Google Cloud Run.
2. Use Google ADK with Vertex AI Reasoning Engine (`us-central1`).
3. Trigger the same agent multiple times in a short time window (≈ concurrent or burst traffic).
4. Observe `429 RESOURCE_EXHAUSTED` errors surfaced as streaming failures.

---

**Expected Behavior:**
Requests may slow down or queue, but should not fail with a hard error during streaming. Ideally, retries or backoff would be handled gracefully.

---

**Observed Behavior:**
Requests fail with `429 RESOURCE_EXHAUSTED`, wrapped as a `500` streaming error by ADK.

---

**Environment Details:**

* ADK Library Version: latest
* Python Version: 3.12

---

**Model Information:**

* Are you using LiteLLM: No
* Models used: `gemini-2.5-pro`, `gemini-2.5-flash`
* Gemini tier: Tier 2
* Reasoning Engine location: `us-central1`

---

## ❓ Questions / Clarification Needed

1. Is this error strictly caused by the **Vertex AI quota** below?

   ```
   Query Reasoning Engine requests per minute per region = 30
   ```

2. Will increasing this quota fully resolve the issue, or are there **additional ADK-level or Reasoning Engine concurrency limits/bottlenecks**?
3. Does ADK provide any **built-in retry, backoff, or queueing mechanism** for `429 RESOURCE_EXHAUSTED` errors?
4. Are there recommended **production patterns** when using ADK + Reasoning Engine behind Cloud Run?
5. Is it possible to self host the **reasoning engine** locally inside my server and use ADK, so that I only need to worry about the Gemini LLM Request quotas?

I’m planning to launch this service soon and want to ensure the setup is production-safe under burst traffic.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents. #4323

`RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine`

Issue Description

❓ Questions / Clarification Needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents. #4323

Description

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine

Issue Description

❓ Questions / Clarification Needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine`