Skip to content

ngoyal88/Relay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Relay - Intelligent AI API Gateway

Go Version License Docker Pulls GitHub Stars

A high-performance reverse proxy and API gateway for AI services (OpenAI, Anthropic, etc.) with built-in caching, rate limiting, cost tracking, and observability.

✨ Features

  • ⚑ Smart Caching - Redis-backed response caching to reduce costs
  • πŸ›‘οΈ Rate Limiting - Distributed rate limiting with Redis (or in-memory fallback)
  • πŸ’° Cost Tracking - Real-time token usage and cost estimation
  • πŸ”„ Circuit Breaker - Automatic failure detection and recovery
  • πŸ“Š Prometheus Metrics - Built-in observability with /metrics endpoint
  • πŸ”₯ Hot Reload - Configuration updates without restarts
  • 🐳 Docker Ready - Multi-stage builds for minimal image size
  • πŸ”Œ Zero Dependencies - Works standalone or with Redis for advanced features

🎯 Use Cases

  • Cost Optimization: Cache repeated queries to reduce AI API costs by up to 80%
  • Rate Limit Management: Prevent overages with smart request throttling
  • Multi-Model Support: Route requests to different AI providers
  • Observability: Track usage, costs, and performance in real-time
  • Team Collaboration: Centralized AI gateway for multiple applications

πŸš€ Quick Start

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/ngoyal88/relay.git
cd relay

# Copy and edit configuration
cp configs/config.example.yaml configs/config.yaml
nano configs/config.yaml

# Start with Docker Compose (includes Redis)
docker-compose up -d

# Your relay is now running on http://localhost:8080

Option 2: Binary

# Download latest release
curl -sSL https://github.com/ngoyal88/relay/releases/latest/download/relay-linux-amd64 -o relay
chmod +x relay

# Create config
curl -sSL https://raw.githubusercontent.com/ngoyal88/relay/main/configs/config.example.yaml -o config.yaml

# Run
./relay

Option 3: From Source

git clone https://github.com/ngoyal88/relay.git
cd relay
cp configs/config.example.yaml configs/config.yaml
go run cmd/main.go

πŸ“– Usage

Basic Proxying

# Replace OpenAI API calls with your relay endpoint
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using with OpenAI Python SDK

import openai

# Point to your relay instead of OpenAI directly
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "YOUR_OPENAI_KEY"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Monitoring

# View Prometheus metrics
curl http://localhost:8080/metrics

# Key metrics:
# - relay_cache_hits_total
# - relay_cache_misses_total
# - relay_request_tokens (histogram)
# - relay_upstream_latency_seconds (histogram)

βš™οΈ Configuration

Edit configs/config.yaml:

server:
  port: ":8080"

proxy:
  target: "https://api.openai.com"  # Target API endpoint

ratelimit:
  enabled: true
  requests_per_second: 10.0         # Adjust based on your needs
  burst: 20                          # Allow bursts

redis:
  enabled: true                      # Disable for in-memory mode
  address: "localhost:6379"
  password: ""
  db: 0

# Pricing in USD per 1K tokens (for cost tracking)
models:
  gpt-4: 0.03
  gpt-4-32k: 0.06
  gpt-3.5-turbo: 0.002
  claude-3-opus: 0.015
  claude-3-sonnet: 0.003

Hot Reload: Changes to config.yaml are automatically detected and applied without restart!

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client  │─────▢│         Relay               │─────▢│ OpenAI  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚                             β”‚      β”‚ API      β”‚
                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  β”‚ Request Logger        β”‚  β”‚
                 β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
                 β”‚  β”‚ Token Cost Tracker    β”‚  β”‚
                 β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚  β”‚ Redis Cache           │◀─┼────▢│  Redis  β”‚
                 β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  β”‚ Rate Limiter          β”‚  β”‚
                 β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  │─
                 β”‚  β”‚ Circuit Breaker       β”‚  β”‚
                 β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ Prometheus   β”‚
                      β”‚ Metrics      β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Advanced Features

Rate Limiting Strategies

# Per-second limits (smooth traffic)
ratelimit:
  requests_per_second: 10.0
  burst: 20

# Low-frequency limits (e.g., 1 request per 5 seconds)
ratelimit:
  requests_per_second: 0.2  # 1/5 = 0.2
  burst: 1

Distributed vs In-Memory Mode

Feature With Redis Without Redis
Caching βœ… Persistent ❌ N/A
Rate Limiting βœ… Distributed (multi-instance) ⚠️ Per-instance only
Scalability βœ… Horizontal ⚠️ Limited

Environment Variables

Override config with environment variables:

export SERVER_PORT=":9090"
export REDIS_ADDRESS="redis.prod.example.com:6379"
export REDIS_PASSWORD="secret"
./relay

πŸ“Š Monitoring & Observability

Prometheus Integration

# prometheus.yml
scrape_configs:
  - job_name: 'relay'
    static_configs:
      - targets: ['localhost:8080']

Grafana Dashboard

Import the included dashboard: deploy/grafana/relay-dashboard.json

Key Metrics:

  • Cache hit rate
  • Request latency (p50, p95, p99)
  • Token usage by model
  • Estimated costs
  • Rate limit violations
  • Circuit breaker state

🚒 Production Deployment

Docker Swarm

docker stack deploy -c docker-compose.yml relay-stack

Kubernetes

kubectl apply -f deploy/kubernetes/

Helm

helm repo add relay https://yourusername.github.io/relay-helm
helm install my-relay relay/relay

πŸ› οΈ Development

# Install dependencies
go mod download

# Run tests
go test ./...

# Run with live reload (install air: go install github.com/cosmtrek/air@latest)
air

# Build
go build -o relay cmd/main.go

πŸ“š Documentation

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors