Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 94 additions & 149 deletions skills/a7-recipe-circuit-breaker/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
name: a7-recipe-circuit-breaker
description: >-
Recipe skill for implementing circuit breaker patterns using the a7 CLI in API7 Enterprise Edition.
Covers the api-breaker plugin for automatic upstream circuit breaking,
configuring unhealthy thresholds, healthy recovery, response code
classification, and integration with health checks.
Covers the api-breaker plugin, unhealthy thresholds, healthy recovery,
response code classification, and integration with service health checks.
version: "1.0.0"
author: API7.ai Contributors
license: Apache-2.0
Expand All @@ -13,91 +12,68 @@ metadata:
apisix_version: ">=3.0.0"
plugin_name: api-breaker
a7_commands:
- a7 service create
- a7 route create
- a7 route update
- a7 route get
- a7 config sync
---

# a7-recipe-circuit-breaker

## Overview

A circuit breaker prevents cascading failures by detecting unhealthy upstream
services and temporarily stopping requests to them. When the upstream returns
too many errors, the circuit "opens" and API7 Enterprise Edition (API7 EE) returns errors immediately
without forwarding requests. After a cooldown period, it "half-opens" to test
if the upstream has recovered.
A circuit breaker prevents cascading failures by detecting unhealthy backend
responses and temporarily stopping requests to the failing service. API7 EE
implements this through the `api-breaker` plugin on routes.

API7 EE implements this via the `api-breaker` plugin, which tracks response
status codes and manages circuit state automatically across a gateway group.
Use the current service-backed route model:

## When to Use

- Protect your API from cascading failures when an upstream goes down.
- Automatically stop sending traffic to failing backends.
- Allow failing services time to recover before retrying.
- Return fast error responses instead of waiting for timeouts.

## Circuit Breaker States

```
┌─────────┐
│ CLOSED │ ← Normal operation: requests flow through
│(healthy) │
└────┬─────┘
│ Error count exceeds threshold
┌─────────┐
│ OPEN │ ← Breaker tripped: returns configured status immediately
│(tripped) │
└────┬─────┘
│ After cooldown period
┌──────────┐
│HALF-OPEN │ ← Test: allows one request through
│ (testing) │
└─────┬────┘
┌───────┴───────┐
│ │
Success Failure
│ │
▼ ▼
CLOSED OPEN (longer cooldown)
```
1. Create a service that owns the upstream backend.
2. Create a route with `service_id`.
3. Enable `api-breaker` on the route.

## Plugin Configuration Reference

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `break_response_code` | integer | **Yes** | — | HTTP status code returned when circuit is open (e.g., 502, 503). |
| `break_response_body` | string | No | — | Response body returned when circuit is open. |
| `break_response_headers` | array[object] | No | — | Response headers when circuit is open. Format: `[{"key": "name", "value": "val"}]`. |
| `unhealthy.http_statuses` | array[integer] | No | `[500]` | HTTP status codes from upstream that count as unhealthy. |
| `unhealthy.failures` | integer | No | `3` | Number of consecutive unhealthy responses before opening the circuit. |
| `healthy.http_statuses` | array[integer] | No | `[200]` | HTTP status codes from upstream that count as healthy (for recovery). |
| `healthy.successes` | integer | No | `3` | Number of consecutive healthy responses to close the circuit. |
| `max_breaker_sec` | integer | No | `300` | Maximum circuit-open duration in seconds. Cooldown doubles each time but caps here. |

## Breaker Timing
| Field | Required | Description |
|-------|----------|-------------|
| `break_response_code` | Yes | HTTP status returned when the circuit is open |
| `break_response_body` | No | Response body returned when open |
| `break_response_headers` | No | Headers returned when open |
| `unhealthy.http_statuses` | No | Upstream status codes counted as unhealthy |
| `unhealthy.failures` | No | Consecutive unhealthy responses before opening |
| `healthy.http_statuses` | No | Status codes counted as healthy for recovery |
| `healthy.successes` | No | Consecutive healthy responses before closing |
| `max_breaker_sec` | No | Maximum circuit-open duration |

When the circuit opens:
1. First open: **2 seconds** cooldown.
2. If it opens again: **4 seconds** (doubles).
3. Next: **8 seconds**, **16 seconds**, ...
4. Caps at `max_breaker_sec` (default 300s = 5 minutes).
## Step-by-Step: Enable Circuit Breaker

During cooldown, all requests get the `break_response_code` immediately.
### 1. Create a protected service

## Step-by-Step: Enable Circuit Breaker
```bash
a7 service create --gateway-group default -f - <<'EOF'
{
"id": "backend-service",
"name": "backend-service",
"upstream": {
"type": "roundrobin",
"nodes": [
{"host": "backend", "port": 8080, "weight": 1}
]
}
}
EOF
```

### 1. Basic circuit breaker
### 2. Create a route with `api-breaker`

```bash
a7 route create --gateway-group default -f - <<'EOF'
{
"id": "protected-api",
"uri": "/api/*",
"name": "protected-api",
"paths": ["/api/*"],
"service_id": "backend-service",
"plugins": {
"api-breaker": {
"break_response_code": 502,
Expand All @@ -111,28 +87,24 @@ a7 route create --gateway-group default -f - <<'EOF'
},
"max_breaker_sec": 300
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"backend:8080": 1
}
}
}
EOF
```

After 3 consecutive 500/502/503 responses, the circuit opens and returns 502
immediately. After cooldown, it tests with one request. If 3 consecutive 200s
come back, the circuit closes and normal operation resumes.
After three consecutive 500/502/503 responses, the circuit opens and returns
502 immediately. After cooldown, API7 EE tests recovery and closes the circuit
after enough healthy responses.

### 2. Circuit breaker with custom error body
### 3. Custom error response

```bash
a7 route create --gateway-group default -f - <<'EOF'
a7 route update protected-api --gateway-group default -f - <<'EOF'
{
"id": "api-with-error-body",
"uri": "/api/*",
"id": "protected-api",
"name": "protected-api",
"paths": ["/api/*"],
"service_id": "backend-service",
"plugins": {
"api-breaker": {
"break_response_code": 503,
Expand All @@ -151,78 +123,46 @@ a7 route create --gateway-group default -f - <<'EOF'
},
"max_breaker_sec": 60
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"backend:8080": 1
}
}
}
EOF
Comment on lines 101 to 128
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a7 route update <id> -f performs a full PUT replacement (not a merge/patch). This example only supplies service_id and plugins, so it will likely wipe fields like paths/uri/name, potentially making the route invalid or no longer match any requests. Update the example to send a complete route definition (including existing paths and other required fields), or show a workflow like a7 route get ... --output json → edit → a7 route update ... -f.

Copilot uses AI. Check for mistakes.
```

### 3. Sensitive circuit breaker (trips on first error)
### 4. Combine with health checks

```json
{
"plugins": {
"api-breaker": {
"break_response_code": 503,
"unhealthy": {
"http_statuses": [500, 502, 503],
"failures": 1
},
"healthy": {
"http_statuses": [200],
"successes": 1
},
"max_breaker_sec": 30
}
}
}
```

Trips on the very first 5xx error. Recovers after one successful response.

## Combining with Health Checks

For production, combine the circuit breaker with upstream health checks.
The circuit breaker handles per-route protection while health checks manage
per-node health at the upstream level.
For production, define health checks on the service upstream and keep
`api-breaker` on the route. Health checks manage node health; the circuit
breaker protects this route from repeated upstream failures.

```bash
# Create upstream with health checks
a7 upstream create --gateway-group default -f - <<'EOF'
a7 service create --gateway-group default -f - <<'EOF'
{
"id": "monitored-backend",
"type": "roundrobin",
"nodes": {
"backend-1:8080": 1,
"backend-2:8080": 1
},
"checks": {
"active": {
"type": "http",
"http_path": "/health",
"healthy": {
"interval": 5,
"successes": 2
},
"unhealthy": {
"interval": 3,
"http_failures": 3
"id": "monitored-backend-service",
"name": "monitored-backend-service",
"upstream": {
"type": "roundrobin",
"nodes": [
{"host": "backend-1", "port": 8080, "weight": 1},
{"host": "backend-2", "port": 8080, "weight": 1}
],
"checks": {
"active": {
"type": "http",
"http_path": "/health",
"healthy": {"interval": 5, "successes": 2},
"unhealthy": {"interval": 3, "http_failures": 3}
}
}
}
}
EOF

# Create route with circuit breaker
a7 route create --gateway-group default -f - <<'EOF'
{
"id": "api",
"uri": "/api/*",
"name": "api",
"paths": ["/api/*"],
"service_id": "monitored-backend-service",
"plugins": {
"api-breaker": {
"break_response_code": 503,
Expand All @@ -235,20 +175,30 @@ a7 route create --gateway-group default -f - <<'EOF'
"successes": 3
}
}
},
"upstream_id": "monitored-backend"
}
}
EOF
```

## Config Sync Example
## Config Sync

```yaml
version: "1"
gateway_group: default
services:
- id: backend-service
name: backend-service
upstream:
type: roundrobin
nodes:
- host: backend
port: 8080
weight: 1
routes:
- id: protected-api
uri: /api/*
name: protected-api
paths:
- /api/*
service_id: backend-service
plugins:
api-breaker:
break_response_code: 503
Expand All @@ -265,21 +215,16 @@ routes:
http_statuses: [200]
successes: 3
max_breaker_sec: 300
upstream_id: backend
upstreams:
- id: backend
type: roundrobin
nodes:
"backend:8080": 1
```

## Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| Circuit never opens | `unhealthy.http_statuses` doesn't include the error code | Add the actual error codes your upstream returns |
| Circuit stays open too long | `max_breaker_sec` too high | Lower `max_breaker_sec` for faster recovery |
| Circuit flaps open/closed | Threshold too low with intermittent errors | Increase `unhealthy.failures` threshold |
| 502 from API7 EE (not circuit breaker) | Upstream truly unreachable (connection refused) | Connection errors also count toward unhealthy threshold |
| Recovery too slow | `healthy.successes` too high | Lower `healthy.successes` for faster recovery |
| Command failed with 403 | RBAC permission issue | Ensure your token has permission to modify routes in the gateway group |
| Circuit never opens | `unhealthy.http_statuses` misses the real error code | Add the actual upstream error codes |
| Circuit stays open too long | `max_breaker_sec` too high | Lower `max_breaker_sec` |
| Circuit flaps | Threshold too low for intermittent errors | Increase `unhealthy.failures` |
| API7 returns 502 outside breaker response | Backend is unreachable | Connection errors also count toward unhealthy thresholds |
| Recovery too slow | `healthy.successes` too high | Lower `healthy.successes` |
| Route not using breaker | Plugin attached to the wrong route | Verify with `a7 route get <id> -o json` |
| Command failed with 403 | RBAC permission issue | Ensure your token can modify routes in the gateway group |
Loading
Loading