feat(server): add graceful shutdown on SIGTERM/SIGINT#2787
Closed
feat(server): add graceful shutdown on SIGTERM/SIGINT#2787
Conversation
Contributor
🧪 BenchmarkShould we run the Virtual MCP strategy benchmark for this PR? React with 👍 to run the benchmark.
Benchmark will run on the next push after you react. |
Contributor
Release OptionsSuggested: Minor ( React with an emoji to override the release type:
Current version:
|
Contributor
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/mesh/src/api/app.ts">
<violation number="1" location="apps/mesh/src/api/app.ts:1156">
P2: Wrap cleanup callbacks in a deferred promise (`Promise.resolve().then(...)`) so synchronous throws are captured by `Promise.allSettled` instead of aborting shutdown early.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Clean up all resources in order when the process receives a termination signal: stop workers, drain NATS, flush telemetry, close database. Prevents orphaned connections and lost telemetry on K8s pod termination. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… shutdown - Add /health/live (liveness) and /health/ready (readiness) endpoints - /health kept for backwards compatibility - /health/ready checks DB and NATS connectivity, returns 503 during shutdown - Expose markShuttingDown() separately so readiness returns 503 before server.stop() is called — gives K8s ~2s to drain traffic - NATS_URL now accepts comma-separated URLs for cluster failover - Bump helm chart to 0.1.41 with updated probe paths and terminationGracePeriodSeconds=60 - Add deploy/docker-compose/docker-compose.dev.yml with 3-node NATS cluster and PostgreSQL for local development
480d49e to
fee6fc4
Compare
Contributor
Author
|
merged by #2855 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is this contribution about?
Adds graceful shutdown handling when the MCP Mesh server receives SIGTERM (K8s pod termination) or SIGINT (Ctrl+C). Previously, all resources — database connections, NATS subscriptions, event bus workers, in-flight telemetry — were abandoned without cleanup.
The shutdown sequence follows a strict order: stop HTTP servers → stop workers in parallel (EventBus, SSE hub, cron, decopilot) → drain NATS (after all consumers stopped) → flush telemetry → close database. A 10-second force-exit timeout prevents the process from hanging indefinitely.
Two-file change using
Object.assign(app, { shutdown })to preserve backward compatibility with existing tests.How to Test
bun run dev, then press Ctrl+C[shutdown] Received SIGINT...→[shutdown] Stopping workers...→[shutdown] Cleanup complete.bun run dev, then in another terminalkill -TERM <pid>— same clean shutdown behaviorbun run checkpasses,bun test apps/mesh/src/api/passes (317 tests)Review Checklist
Summary by cubic
Add graceful shutdown on SIGTERM/SIGINT and separate liveness/readiness probes. Ensures clean shutdown, prevents lost data, and improves K8s draining during rollouts.
/health/liveand/health/ready; readiness checks DB and NATS, returns 503 during shutdown viaapp.markShuttingDown()./healthkept for compatibility.app.shutdown().NATS_URLnow accepts comma-separated URLs for cluster failover.terminationGracePeriodSeconds=60; addeddeploy/docker-compose/docker-compose.dev.ymlfor local Postgres and a 3-node NATS cluster.Written for commit fee6fc4. Summary will update on new commits.