This document describes the circuit breaker and retry mechanisms implemented for Stellar Horizon network calls.
The Callora backend implements two key resilience patterns to handle transient failures and prevent cascading failures when interacting with the Stellar Horizon network:
- Bounded Retry with Exponential Backoff - Automatically retries failed operations with increasing delays
- Circuit Breaker Pattern - Prevents resource exhaustion by fast-failing when services are unavailable
The circuit breaker operates in three states:
┌─────────┐
│ CLOSED │ ◄─────────────────────────┐
│ (Normal)│ │
└────┬────┘ │
│ │
│ Failures ≥ Threshold │ Success in HALF_OPEN
│ │
▼ │
┌─────────┐ ┌────┴────────┐
│ OPEN │──────────────────────►│ HALF_OPEN │
│(Failing)│ After Cooldown │ (Testing) │
└─────────┘ └─────────────┘
│ │
│ │
└─────────────────────────────────┘
Failure in HALF_OPEN
CLOSED (Normal Operation)
- All requests pass through to Horizon
- Failures increment a counter; successes reset it
- Transitions to OPEN when consecutive failures exceed threshold
OPEN (Fast-Fail Mode)
- All requests immediately fail with
CircuitBreakerOpenError - No requests are sent to Horizon (protects downstream services)
- After cooldown period, transitions to HALF_OPEN
HALF_OPEN (Recovery Testing)
- Allows a single probe request through
- Success → transition back to CLOSED
- Failure → return to OPEN and reset cooldown timer
The retry mechanism implements exponential backoff with jitter:
Formula: delay = min(baseDelay × 2^attempt, maxDelay) × (1 ± jitter)
Example with defaults:
- Attempt 1: Immediate
- Attempt 2: ~1000ms (1s ± 30%)
- Attempt 3: ~2000ms (2s ± 30%)
Benefits:
- Exponential backoff reduces load on failing services
- Jitter prevents thundering herd problem
- Bounded delays prevent indefinite waiting
| Variable | Description | Default |
|---|---|---|
HORIZON_URL |
Stellar Horizon endpoint | https://horizon-testnet.stellar.org |
STELLAR_BASE_FEE |
Transaction base fee (stroops) | 100 |
STELLAR_TRANSACTION_TIMEOUT |
Transaction timeout (seconds) | 30 |
CIRCUIT_BREAKER_THRESHOLD |
Failures before opening circuit | 5 |
CIRCUIT_BREAKER_COOLDOWN_MS |
Cooldown period (milliseconds) | 30000 (30s) |
RETRY_MAX_ATTEMPTS |
Maximum retry attempts | 3 |
RETRY_BASE_DELAY_MS |
Initial retry delay (milliseconds) | 1000 (1s) |
Development (Fast Recovery):
CIRCUIT_BREAKER_THRESHOLD=3
CIRCUIT_BREAKER_COOLDOWN_MS=10000
RETRY_MAX_ATTEMPTS=2
RETRY_BASE_DELAY_MS=500Production (Conservative):
CIRCUIT_BREAKER_THRESHOLD=10
CIRCUIT_BREAKER_COOLDOWN_MS=60000
RETRY_MAX_ATTEMPTS=5
RETRY_BASE_DELAY_MS=2000Build a vault deposit transaction with resilience patterns.
Request:
{
"sourcePublicKey": "GSOURCE123...",
"vaultPublicKey": "GVAULT456...",
"amount": "100.5"
}Success Response (200):
{
"success": true,
"transactionXdr": "AAAAA...ZZZZZ"
}Error Responses:
400 Bad Request - Invalid input
{
"success": false,
"error": "Invalid request body. Required fields: sourcePublicKey, vaultPublicKey, amount"
}502 Bad Gateway - Circuit breaker open or retries exhausted
{
"success": false,
"error": "Stellar Horizon service is currently unavailable. Circuit breaker is open. Please try again later."
}500 Internal Server Error - Unexpected error
{
"success": false,
"error": "Internal server error"
}Get circuit breaker health metrics.
Response (200):
{
"success": true,
"circuitBreaker": {
"state": "CLOSED",
"consecutiveFailures": 0,
"consecutiveSuccesses": 5,
"totalFailures": 2,
"totalSuccesses": 10,
"lastFailureTime": null,
"lastStateChange": 1234567890
}
}CircuitBreakerOpenError
- Thrown when circuit breaker is in OPEN state
- Mapped to HTTP 502 Bad Gateway
- Indicates upstream service is unavailable
RetryExhaustedError
- Thrown when all retry attempts fail
- Mapped to HTTP 502 Bad Gateway
- Contains attempt count and last error
BadRequestError
- Thrown for invalid client input
- Mapped to HTTP 400 Bad Request
- Validation errors
Horizon Call
│
├─► Success ──────────────────────► Return Result
│
└─► Failure
│
├─► Retry (with backoff)
│ │
│ ├─► Success ─────────────► Return Result
│ │
│ └─► Max Retries ─────────► RetryExhaustedError → 502
│
└─► Circuit Breaker Check
│
├─► CLOSED ──────────────► Continue
│
├─► HALF_OPEN ───────────► Allow Probe
│
└─► OPEN ────────────────► CircuitBreakerOpenError → 502
-
Circuit Breaker State
- Alert when state transitions to OPEN
- Track time spent in each state
-
Failure Rates
totalFailures / (totalFailures + totalSuccesses)- Alert on sustained high failure rates
-
Consecutive Failures
- Early warning before circuit opens
- Alert at 50% of threshold
-
Retry Attempts
- Track average retries per request
- High retry counts indicate instability
Poll /api/deposits/health to monitor circuit breaker state:
curl http://localhost:3000/api/deposits/healthHealthy Response:
{
"circuitBreaker": {
"state": "CLOSED",
"consecutiveFailures": 0
}
}Degraded Response:
{
"circuitBreaker": {
"state": "OPEN",
"consecutiveFailures": 5,
"lastFailureTime": 1234567890
}
}# Run all tests
npm test
# Run with coverage
npm test -- --coverage
# Run specific test suite
npm test -- retry.test.ts
npm test -- circuitBreaker.test.ts
npm test -- transactionBuilder.test.ts
npm test -- depositController.test.tsThe implementation includes comprehensive tests covering:
- ✅ Successful operations on first attempt
- ✅ Transient failures with successful retry
- ✅ Persistent failures exhausting retries
- ✅ Circuit breaker state transitions
- ✅ Fast-fail behavior when circuit is open
- ✅ Recovery after cooldown period
- ✅ HTTP error mapping (400, 502, 500)
- ✅ Request validation
- ✅ Concurrent operations
Target Coverage: 90%+ line coverage
Test Circuit Breaker Trip:
-
Configure low threshold:
export CIRCUIT_BREAKER_THRESHOLD=2 export RETRY_MAX_ATTEMPTS=1
-
Make requests with invalid Horizon URL:
export HORIZON_URL=http://invalid-horizon.example.com -
Send multiple requests:
curl -X POST http://localhost:3000/api/deposits/build \ -H "Content-Type: application/json" \ -d '{ "sourcePublicKey": "GSOURCE...", "vaultPublicKey": "GVAULT...", "amount": "100" }'
-
Observe circuit breaker open after threshold failures
-
Check health endpoint:
curl http://localhost:3000/api/deposits/health
Increase Threshold when:
- Experiencing frequent false positives
- Network is inherently unstable but recovers quickly
- Cost of circuit opening is high
Decrease Threshold when:
- Failures cascade to other services
- Recovery time is long
- Want faster failure detection
Increase Cooldown when:
- Service takes long to recover
- Want to reduce probe frequency
- Avoiding premature recovery attempts
Decrease Cooldown when:
- Service recovers quickly
- Want faster recovery
- Acceptable to probe more frequently
-
Start Conservative
- Higher thresholds (8-10 failures)
- Longer cooldowns (60s)
- More retry attempts (4-5)
-
Monitor and Tune
- Collect metrics for 1-2 weeks
- Analyze failure patterns
- Adjust based on actual behavior
-
Alert Configuration
- Alert on circuit OPEN state
- Alert on sustained high failure rates
- Alert on retry exhaustion
-
Graceful Degradation
- Cache recent successful responses
- Provide fallback values when possible
- Clear user communication during outages
Symptoms: Circuit remains OPEN despite service recovery
Solutions:
- Check cooldown period hasn't elapsed
- Verify Horizon URL is correct
- Test Horizon connectivity directly
- Review logs for underlying errors
- Manually reset if necessary (restart service)
Symptoms: High latency, many retry attempts
Solutions:
- Reduce
RETRY_MAX_ATTEMPTS - Increase
RETRY_BASE_DELAY_MS - Lower
CIRCUIT_BREAKER_THRESHOLDto fail faster - Investigate root cause of failures
Symptoms: Circuit opens during normal operation
Solutions:
- Increase
CIRCUIT_BREAKER_THRESHOLD - Review failure patterns (are they truly transient?)
- Improve retry logic for specific error types
- Consider separate circuits for different operations
src/
├── lib/
│ ├── errors.ts # Custom error classes
│ ├── retry.ts # Retry mechanism
│ ├── retry.test.ts # Retry tests
│ ├── circuitBreaker.ts # Circuit breaker implementation
│ └── circuitBreaker.test.ts # Circuit breaker tests
├── services/
│ ├── transactionBuilder.ts # Stellar transaction builder
│ └── transactionBuilder.test.ts # Transaction builder tests
├── controllers/
│ ├── depositController.ts # Deposit API controller
│ └── depositController.test.ts # Controller tests
└── index.ts # Express app with routes
withRetry<T>(operation, config)
- Wraps async operations with retry logic
- Returns result or throws
RetryExhaustedError
CircuitBreaker.execute<T>(operation)
- Wraps operations with circuit breaker
- Manages state transitions
- Throws
CircuitBreakerOpenErrorwhen open
StellarTransactionBuilder.loadAccount(publicKey)
- Loads account from Horizon with resilience
- Combines retry + circuit breaker
buildDepositTransaction(req, res, next)
- Express controller for deposit endpoint
- Maps errors to appropriate HTTP status codes