feat: add /health and /ready endpoints#37
Merged
franciscojavierarceo merged 3 commits intoMay 29, 2026
Conversation
Add GET /health (liveness) and GET /ready (readiness) endpoints to the gateway. /health returns 200 unconditionally when the process is listening. /ready probes the LLM backend's /health and returns 503 if unreachable, enabling proper K8s/docker-compose orchestration. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
franciscojavierarceo
requested changes
May 28, 2026
Collaborator
franciscojavierarceo
left a comment
There was a problem hiding this comment.
thanks a ton for this @ashwing!! can you update your PR? The CI is failing.
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Contributor
Author
|
Thanks @franciscojavierarceo! Fixed — the |
| let base = state.config.llm_api_base.trim_end_matches('/'); | ||
| let url = format!("{base}/health"); | ||
|
|
||
| let client = reqwest::Client::builder() |
Collaborator
There was a problem hiding this comment.
this reqwest is missing authorization headers. in readiness.rs we do insert the headers.
can you fix this asymmetry?
Contributor
Author
There was a problem hiding this comment.
Good catch — fixed in 70c3849. The readiness probe now injects the Bearer token from OPENAI_API_KEY the same way readiness.rs does at startup.
Match the auth injection pattern from readiness.rs — forward the configured OPENAI_API_KEY as a Bearer token when probing the LLM backend's /health endpoint. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
franciscojavierarceo
approved these changes
May 29, 2026
Collaborator
franciscojavierarceo
left a comment
There was a problem hiding this comment.
lgtm! thanks for the quick fixes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add liveness (
GET /health) and readiness (GET /ready) endpoints to the gateway, enabling proper orchestration under Kubernetes, ECS, and docker-compose./healthreturns200 OKunconditionally when the process is listening (liveness probe)/readyprobes the LLM backend's/healthwith a 2-second timeout and returns200if healthy,503 Service Unavailableotherwise (readiness probe)This separates "is the gateway process alive?" from "is the system ready to serve traffic?" — standard for any load-balanced deployment.
Closes #31
Related
Changes
crates/agentic-server/src/handler.rshealth()andready()handler functionscrates/agentic-server/src/app.rs/healthand/readycrates/agentic-server/Cargo.tomlreqwestruntime dependency (for backend probe)crates/agentic-server/tests/health_test.rsTest Plan
Automated (4 integration tests):
test_health_returns_200— gateway up, LLM up → 200test_health_returns_200_even_when_llm_down— gateway up, LLM unreachable → 200 (liveness unaffected)test_ready_returns_200_when_llm_healthy— LLM responds on/health→ 200test_ready_returns_503_when_llm_unreachable— LLM unreachable → 503All pass via
cargo test --workspace.Manual (live vLLM backend):
Built the gateway on an EC2 g6e.48xlarge instance and pointed it at a running vLLM server (port 8000):
Lint/format:
cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt -- --check— clean