Skip to content

fix: use signal.NotifyContext for container cleanup on SIGINT/SIGTERM#2808

Open
bfirsh wants to merge 1 commit intomainfrom
fix/signal-container-cleanup
Open

fix: use signal.NotifyContext for container cleanup on SIGINT/SIGTERM#2808
bfirsh wants to merge 1 commit intomainfrom
fix/signal-container-cleanup

Conversation

@bfirsh
Copy link
Member

@bfirsh bfirsh commented Mar 4, 2026

Summary

  • Replace manual goroutine-based signal handlers in cog predict and cog train with signal.NotifyContext, ensuring containers are always cleaned up on SIGINT/SIGTERM
  • Remove redundant signal-catching goroutines that used the parent context (which could already be cancelled)
  • Remove FIXME: will not run on signal comments since the issue is now resolved

Problem

Both pkg/cli/predict.go and pkg/cli/train.go had two competing signal-handling mechanisms, neither fully correct:

  1. A goroutine with signal.Notify that caught SIGINT and called predictor.Stop(ctx) — but used the parent context which could already be cancelled
  2. A deferred predictor.Stop(context.Background()) — but Go defers don't run when the process exits from a signal handler

This meant hitting Ctrl+C during cog predict or cog train could leave orphaned Docker containers running.

Solution

Use signal.NotifyContext to create a context that is automatically cancelled on SIGINT/SIGTERM. When the signal arrives, in-flight operations (e.g. predictor.Start(), predictor.Predict()) return with a context cancellation error, the function returns normally, and the deferred predictor.Stop(context.Background()) runs to clean up the container.

Fixes #2799

Replace manual goroutine-based signal handlers in predict and train
commands with signal.NotifyContext. This ensures the context is cancelled
on SIGINT/SIGTERM, causing in-flight operations to return and the
deferred container cleanup to run reliably.

Previously, the deferred predictor.Stop() would not execute when the
process received a signal, leaving orphaned Docker containers running.

Fixes #2799
@bfirsh bfirsh requested a review from a team as a code owner March 4, 2026 23:31
@michaeldwan michaeldwan added this to the 0.18.0 milestone Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: container cleanup defer does not run on signal, leaving orphaned containers

2 participants