Skip to content

feat: Add /health and /ready endpoints for production deployments #31

@ashwing

Description

@ashwing

Problem

Production deployments need to distinguish between:

  • Liveness: Is the gateway process responsive? (restart if not)
  • Readiness: Can the gateway serve traffic? (remove from load balancer if not)

This applies across all orchestration platforms — Kubernetes readiness probes, ECS health checks, docker-compose healthcheck, Consul/Nomad service health, and any load balancer that routes based on HTTP health endpoints. Without these, operators must rely on TCP checks or wait for client timeouts to detect backend failures.

Currently, agentic-api has no health check endpoints.

Proposal

GET /health

  • Returns 200 OK unconditionally when the gateway process is listening
  • Use for liveness checks (detect crashed/hung gateway process)
  • ECS: container health check; K8s: liveness probe; ALB/NLB: target group health

GET /ready

  • Probes the configured LLM backend's /health endpoint (with 2s timeout)
  • Returns 200 OK if backend is reachable and healthy
  • Returns 503 Service Unavailable if backend is down/unreachable
  • Use for readiness checks (prevent routing to gateway when backend is unavailable)
  • ECS: ALB target group health check path; K8s: readiness probe; Consul: service check

Deployment Examples

Kubernetes:

livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 2

ECS (task definition):

{
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -f http://localhost:8000/ready || exit 1"],
    "interval": 10,
    "timeout": 3,
    "retries": 3,
    "startPeriod": 30
  }
}

docker-compose:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/ready"]
  interval: 10s
  timeout: 3s
  retries: 3
  start_period: 30s

Background

This follows the design discussion in vllm-project/vllm#36960, where the community identified the need for separate liveness/readiness checks after observing that GPU page faults can leave processes alive but unable to serve inference requests. While vLLM is addressing backend health detection, the gateway layer needs its own checks to handle network failures, misconfigurations, and backend unavailability — regardless of the orchestration platform.

Dependencies

Implementation

I have a working implementation (~30 LOC + integration tests) ready to submit as a PR once #29 merges. Happy to open the PR now for early review if preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions