A high-performance reverse proxy and API gateway for AI services (OpenAI, Anthropic, etc.) with built-in caching, rate limiting, cost tracking, and observability.
- β‘ Smart Caching - Redis-backed response caching to reduce costs
- π‘οΈ Rate Limiting - Distributed rate limiting with Redis (or in-memory fallback)
- π° Cost Tracking - Real-time token usage and cost estimation
- π Circuit Breaker - Automatic failure detection and recovery
- π Prometheus Metrics - Built-in observability with
/metricsendpoint - π₯ Hot Reload - Configuration updates without restarts
- π³ Docker Ready - Multi-stage builds for minimal image size
- π Zero Dependencies - Works standalone or with Redis for advanced features
- Cost Optimization: Cache repeated queries to reduce AI API costs by up to 80%
- Rate Limit Management: Prevent overages with smart request throttling
- Multi-Model Support: Route requests to different AI providers
- Observability: Track usage, costs, and performance in real-time
- Team Collaboration: Centralized AI gateway for multiple applications
# Clone the repository
git clone https://github.com/ngoyal88/relay.git
cd relay
# Copy and edit configuration
cp configs/config.example.yaml configs/config.yaml
nano configs/config.yaml
# Start with Docker Compose (includes Redis)
docker-compose up -d
# Your relay is now running on http://localhost:8080# Download latest release
curl -sSL https://github.com/ngoyal88/relay/releases/latest/download/relay-linux-amd64 -o relay
chmod +x relay
# Create config
curl -sSL https://raw.githubusercontent.com/ngoyal88/relay/main/configs/config.example.yaml -o config.yaml
# Run
./relaygit clone https://github.com/ngoyal88/relay.git
cd relay
cp configs/config.example.yaml configs/config.yaml
go run cmd/main.go# Replace OpenAI API calls with your relay endpoint
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'import openai
# Point to your relay instead of OpenAI directly
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "YOUR_OPENAI_KEY"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)# View Prometheus metrics
curl http://localhost:8080/metrics
# Key metrics:
# - relay_cache_hits_total
# - relay_cache_misses_total
# - relay_request_tokens (histogram)
# - relay_upstream_latency_seconds (histogram)Edit configs/config.yaml:
server:
port: ":8080"
proxy:
target: "https://api.openai.com" # Target API endpoint
ratelimit:
enabled: true
requests_per_second: 10.0 # Adjust based on your needs
burst: 20 # Allow bursts
redis:
enabled: true # Disable for in-memory mode
address: "localhost:6379"
password: ""
db: 0
# Pricing in USD per 1K tokens (for cost tracking)
models:
gpt-4: 0.03
gpt-4-32k: 0.06
gpt-3.5-turbo: 0.002
claude-3-opus: 0.015
claude-3-sonnet: 0.003Hot Reload: Changes to config.yaml are automatically detected and applied without restart!
βββββββββββ βββββββββββββββββββββββββββββββ ββββββββββββ
β Client βββββββΆβ Relay βββββββΆβ OpenAI β
βββββββββββ β β β API β
β βββββββββββββββββββββββββ β ββββββββββββ
β β Request Logger β β
β βββββββββββββββββββββββββ€ β
β β Token Cost Tracker β β
β βββββββββββββββββββββββββ€ β ββββββββββββ
β β Redis Cache ββββΌβββββΆβ Redis β
β βββββββββββββββββββββββββ€ β ββββββββββββ
β β Rate Limiter β β
β βββββββββββββββββββββββββ€ ββ
β β Circuit Breaker β β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Prometheus β
β Metrics β
ββββββββββββββββ
# Per-second limits (smooth traffic)
ratelimit:
requests_per_second: 10.0
burst: 20
# Low-frequency limits (e.g., 1 request per 5 seconds)
ratelimit:
requests_per_second: 0.2 # 1/5 = 0.2
burst: 1| Feature | With Redis | Without Redis |
|---|---|---|
| Caching | β Persistent | β N/A |
| Rate Limiting | β Distributed (multi-instance) | |
| Scalability | β Horizontal |
Override config with environment variables:
export SERVER_PORT=":9090"
export REDIS_ADDRESS="redis.prod.example.com:6379"
export REDIS_PASSWORD="secret"
./relay# prometheus.yml
scrape_configs:
- job_name: 'relay'
static_configs:
- targets: ['localhost:8080']Import the included dashboard: deploy/grafana/relay-dashboard.json
Key Metrics:
- Cache hit rate
- Request latency (p50, p95, p99)
- Token usage by model
- Estimated costs
- Rate limit violations
- Circuit breaker state
docker stack deploy -c docker-compose.yml relay-stackkubectl apply -f deploy/kubernetes/helm repo add relay https://yourusername.github.io/relay-helm
helm install my-relay relay/relay# Install dependencies
go mod download
# Run tests
go test ./...
# Run with live reload (install air: go install github.com/cosmtrek/air@latest)
air
# Build
go build -o relay cmd/main.goContributions are welcome! Please read CONTRIBUTING.md for guidelines.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Go
- Uses Redis for distributed caching
- Metrics powered by Prometheus
- Token counting via tiktoken-go