Fix Docker scan timeout cancellation not working#1729
Fix Docker scan timeout cancellation not working#1729
Conversation
The cancellation token passed to CreateAndRunContainerAsync was not being honored when blocked on ReadOutputToEndAsync, causing scans to hang indefinitely despite the 10-minute timeout. Changes: - Add ReadContainerOutputAsync with Task.WhenAny pattern to race read against cancellation - Kill container when cancelled to unblock socket-level read - Add DockerServiceStepTelemetryRecord for per-step telemetry to identify hung operations
There was a problem hiding this comment.
Pull request overview
This PR addresses Docker image scan hangs by ensuring cancellation is honored even when Docker.DotNet’s MultiplexedStream.ReadOutputToEndAsync is blocked at the socket level, and adds more granular telemetry to pinpoint which Docker step is hanging.
Changes:
- Add a cancellation-safe container output read (
Task.WhenAnyvs. cancellation) and attempt to unblock reads by removing the container on cancellation. - Introduce per-step Docker service telemetry records to identify which Docker operation is slow/hung.
- Minor refactor to avoid re-serializing the command string in the summary telemetry record.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Common/Telemetry/Records/DockerServiceStepTelemetryRecord.cs | Adds a new telemetry record type for per-step Docker service operations. |
| src/Microsoft.ComponentDetection.Common/DockerService.cs | Implements cancellation-aware output reading and emits per-step telemetry for container lifecycle operations. |
Comments suppressed due to low confidence (4)
src/Microsoft.ComponentDetection.Common/DockerService.cs:218
- In the cancellation path, this method removes the container using CancellationToken.None. If RemoveContainerAsync hangs (e.g., daemon/socket issues), CreateAndRunContainerAsync can still block indefinitely even though the caller’s cancellation was requested. Consider using a separate bounded timeout token for cleanup (e.g., a short CancellationTokenSource) and ensuring cleanup failures don’t prevent propagating the OperationCanceledException.
catch (OperationCanceledException)
{
// Ensure container is removed on cancellation
await RemoveContainerAsync(container.ID, CancellationToken.None);
throw;
src/Microsoft.ComponentDetection.Common/DockerService.cs:218
- ReadContainerOutputAsync removes the container when cancellation wins; the outer catch(OperationCanceledException) also removes the same container. This can result in duplicate RemoveContainer calls and may throw (e.g., 404/not found) and mask the original cancellation. Consider centralizing container cleanup in a single finally block and making removal idempotent (ignore not-found).
catch (OperationCanceledException)
{
// Ensure container is removed on cancellation
await RemoveContainerAsync(container.ID, CancellationToken.None);
throw;
src/Microsoft.ComponentDetection.Common/DockerService.cs:294
- DockerServiceStepTelemetryRecord includes ExceptionMessage, but if CreateContainerAsync fails, the step record will be emitted without the error details. Consider wrapping the CreateContainerAsync call in try/catch to set record.ExceptionMessage before rethrowing.
var response = await Client.Containers.CreateContainerAsync(parameters, cancellationToken);
record.ContainerId = response.ID;
return response;
src/Microsoft.ComponentDetection.Common/DockerService.cs:303
- DockerServiceStepTelemetryRecord includes ExceptionMessage, but if AttachContainerAsync fails, the step record will be emitted without the error details. Consider wrapping the Docker client call in try/catch to set record.ExceptionMessage before rethrowing.
using var record = new DockerServiceStepTelemetryRecord
{
Step = "AttachContainer",
ContainerId = containerId,
};
You can also share your feedback on Copilot code review. Take the survey.
- Observe readTask without blocking using ContinueWith - Make RemoveContainerAsync idempotent (ignore DockerContainerNotFoundException) - Capture all exceptions in step telemetry records
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1729 +/- ##
============================
============================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Addressed @JamieMagee's feedback - using |
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
There was a problem hiding this comment.
Pull request overview
This PR addresses Docker image scans hanging past the configured timeout by ensuring cancellation is honored even when Docker.DotNet output reads are blocked at the socket level, and adds step-level telemetry to identify which Docker operation is hanging.
Changes:
- Adds a cancellation-safe container output read (
ReadContainerOutputAsync) using aTask.WhenAnyrace against cancellation. - Ensures cleanup on cancellation by disposing the stream and forcing container removal.
- Introduces
DockerServiceStepTelemetryRecordto emit telemetry per Docker operation step.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Common/DockerService.cs | Implements cancellation-safe output reading and emits per-step telemetry around Docker operations. |
| src/Microsoft.ComponentDetection.Common/Telemetry/Records/DockerServiceStepTelemetryRecord.cs | Adds a new telemetry record type to capture per-step Docker operation details (step name, container ID, cancellation, error message). |
Comments suppressed due to low confidence (4)
src/Microsoft.ComponentDetection.Common/DockerService.cs:258
ReadContainerOutputAsyncreturnsawait readTaskwithout capturing failures in the step telemetry. IfReadOutputToEndAsyncthrows,DockerServiceStepTelemetryRecord.ExceptionMessageremains unset for the ReadOutput step. Wrap the final await in try/catch to setrecord.ExceptionMessagebefore rethrowing.
return await readTask;
src/Microsoft.ComponentDetection.Common/DockerService.cs:206
MultiplexedStreamreturned fromAttachContainerAsyncis not disposed on the successful path. Since it wraps the underlying HTTP/socket stream, this can leak connections until GC/finalization. Consider disposing it deterministically (e.g.,await using/usingaround the stream, or disposing in afinallyafter output read/removal).
var stream = await AttachContainerAsync(container.ID, cancellationToken);
await StartContainerAsync(container.ID, cancellationToken);
var (stdout, stderr) = await ReadContainerOutputAsync(stream, container.ID, image, cancellationToken);
src/Microsoft.ComponentDetection.Common/DockerService.cs:218
CreateAndRunContainerAsynconly guarantees container cleanup on the success path and when anOperationCanceledExceptionis thrown. IfAttachContainerAsync,StartContainerAsync, or the output read throws a non-cancellation exception, the created container can be left behind. Consider using afinallyblock (best-effort, non-throwing) to remove the container for all failure cases, and avoid redundant double-removal on cancellation sinceReadContainerOutputAsyncalready removes the container when cancellation wins.
catch (OperationCanceledException)
{
// Ensure container is removed on cancellation
await RemoveContainerAsync(container.ID, CancellationToken.None);
throw;
src/Microsoft.ComponentDetection.Common/DockerService.cs:246
- The new cancellation behavior in
ReadContainerOutputAsync(racing the read against a cancellation-aware task and performing cleanup) isn’t covered by existing tests (e.g., no test asserts thatCreateAndRunContainerAsynccancels promptly when output read blocks). Consider adding a unit/integration test that exercises the cancellation path so regressions don’t reintroduce hangs.
var readTask = stream.ReadOutputToEndAsync(CancellationToken.None);
var delayTask = Task.Delay(Timeout.Infinite, cancellationToken);
var completedTask = await Task.WhenAny(readTask, delayTask);
if (completedTask == delayTask)
{
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/Microsoft.ComponentDetection.Common/DockerService.cs:264
- ReadContainerOutputAsync disposes the stream and then awaits RemoveContainerAsync with CancellationToken.None while the caller is already cancelled. If Docker becomes unresponsive, this cleanup call can hang indefinitely and/or throw a different exception that masks the original cancellation. Consider making cancellation cleanup strictly best-effort (e.g., bounded timeout token + swallow/log failures) and then rethrow the OperationCanceledException.
// Remove the container for cleanup
await RemoveContainerAsync(containerId, CancellationToken.None);
cancellationToken.ThrowIfCancellationRequested();
}
src/Microsoft.ComponentDetection.Common/DockerService.cs:206
- The MultiplexedStream returned by AttachContainerAsync is not disposed on the success path (it's only disposed in the cancellation branch of ReadContainerOutputAsync). This can leave sockets/streams open longer than necessary. Consider disposing the stream in a finally/using scope in CreateAndRunContainerAsync (and/or within ReadContainerOutputAsync) so it is always released on success and on non-cancellation failures.
var stream = await AttachContainerAsync(container.ID, cancellationToken);
await StartContainerAsync(container.ID, cancellationToken);
var (stdout, stderr) = await ReadContainerOutputAsync(stream, container.ID, image, cancellationToken);
|
Re: finally block suggestion at line 218 - done! Moved cleanup to \inally\ and removed redundant removal from \ReadContainerOutputAsync. \RemoveContainerAsync\ is already idempotent (catches \DockerContainerNotFoundException). |
| finally | ||
| { | ||
| // Best-effort container cleanup; RemoveContainerAsync already handles not-found. | ||
| await RemoveContainerAsync(container.ID, CancellationToken.None); |
There was a problem hiding this comment.
The finally block always awaits RemoveContainerAsync with CancellationToken.None, so CreateAndRunContainerAsync may still hang indefinitely during cleanup even when the caller’s cancellation/timeout has fired. Consider using a bounded cleanup timeout (e.g., separate CTS with short CancelAfter), or honoring the caller token and swallowing OperationCanceledException during cleanup so cancellation can return promptly.
| await RemoveContainerAsync(container.ID, CancellationToken.None); | |
| // Use a bounded, cancellation-aware token so that cleanup does not hang indefinitely. | |
| using var cleanupCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); | |
| cleanupCts.CancelAfter(TimeSpan.FromSeconds(5)); | |
| try | |
| { | |
| await RemoveContainerAsync(container.ID, cleanupCts.Token); | |
| } | |
| catch (OperationCanceledException) when (cleanupCts.IsCancellationRequested) | |
| { | |
| // Swallow cancellation during best-effort cleanup to allow the caller's | |
| // cancellation or timeout to propagate promptly. | |
| } |
| // Observe the readTask to prevent unobserved task exceptions. | ||
| // Running any continuation automatically marks the exception as observed. | ||
| _ = readTask.ContinueWith( | ||
| static _ => { }, |
There was a problem hiding this comment.
The continuation attached to readTask doesn’t actually observe the exception because it never reads t.Exception (or awaits the task). This can still allow UnobservedTaskException to surface later. Update the continuation to explicitly access/handle the exception (or otherwise observe it) instead of an empty delegate, and adjust the comment accordingly.
| // Observe the readTask to prevent unobserved task exceptions. | |
| // Running any continuation automatically marks the exception as observed. | |
| _ = readTask.ContinueWith( | |
| static _ => { }, | |
| // Observe the readTask to prevent unobserved task exceptions by | |
| // explicitly accessing the Exception property when the task faults. | |
| _ = readTask.ContinueWith( | |
| static t => { _ = t.Exception; }, |
| var readTask = stream.ReadOutputToEndAsync(CancellationToken.None); | ||
| var delayTask = Task.Delay(Timeout.Infinite, cancellationToken); | ||
|
|
||
| var completedTask = await Task.WhenAny(readTask, delayTask); | ||
|
|
||
| if (completedTask == delayTask) | ||
| { | ||
| record.WasCancelled = true; | ||
|
|
||
| // Dispose the stream to unblock any pending read operation | ||
| stream.Dispose(); | ||
|
|
||
| // Observe the readTask to prevent unobserved task exceptions. | ||
| // Running any continuation automatically marks the exception as observed. | ||
| _ = readTask.ContinueWith( | ||
| static _ => { }, | ||
| CancellationToken.None, | ||
| TaskContinuationOptions.OnlyOnFaulted, | ||
| TaskScheduler.Default); | ||
|
|
||
| // Caller is responsible for container cleanup via finally block | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| } | ||
|
|
||
| return await readTask; |
There was a problem hiding this comment.
ReadContainerOutputAsync doesn’t capture failures into the step telemetry record: if ReadOutputToEndAsync faults, the exception will propagate but record.ExceptionMessage remains unset (unlike the other step wrappers). Consider wrapping the read/await in try/catch to set record.ExceptionMessage before rethrowing, so the emitted step record contains the error context.
| var readTask = stream.ReadOutputToEndAsync(CancellationToken.None); | |
| var delayTask = Task.Delay(Timeout.Infinite, cancellationToken); | |
| var completedTask = await Task.WhenAny(readTask, delayTask); | |
| if (completedTask == delayTask) | |
| { | |
| record.WasCancelled = true; | |
| // Dispose the stream to unblock any pending read operation | |
| stream.Dispose(); | |
| // Observe the readTask to prevent unobserved task exceptions. | |
| // Running any continuation automatically marks the exception as observed. | |
| _ = readTask.ContinueWith( | |
| static _ => { }, | |
| CancellationToken.None, | |
| TaskContinuationOptions.OnlyOnFaulted, | |
| TaskScheduler.Default); | |
| // Caller is responsible for container cleanup via finally block | |
| cancellationToken.ThrowIfCancellationRequested(); | |
| } | |
| return await readTask; | |
| try | |
| { | |
| var readTask = stream.ReadOutputToEndAsync(CancellationToken.None); | |
| var delayTask = Task.Delay(Timeout.Infinite, cancellationToken); | |
| var completedTask = await Task.WhenAny(readTask, delayTask); | |
| if (completedTask == delayTask) | |
| { | |
| record.WasCancelled = true; | |
| // Dispose the stream to unblock any pending read operation | |
| stream.Dispose(); | |
| // Observe the readTask to prevent unobserved task exceptions. | |
| // Running any continuation automatically marks the exception as observed. | |
| _ = readTask.ContinueWith( | |
| static _ => { }, | |
| CancellationToken.None, | |
| TaskContinuationOptions.OnlyOnFaulted, | |
| TaskScheduler.Default); | |
| // Caller is responsible for container cleanup via finally block | |
| cancellationToken.ThrowIfCancellationRequested(); | |
| } | |
| return await readTask; | |
| } | |
| catch (Exception ex) | |
| { | |
| record.ExceptionMessage = ex.Message; | |
| throw; | |
| } |
Problem
The cancellation token passed to \CreateAndRunContainerAsync\ was not being honored when blocked on \ReadOutputToEndAsync, causing Docker image scans to hang indefinitely despite the 10-minute timeout.
Root Cause
\MultiplexedStream.ReadOutputToEndAsync\ from Docker.DotNet doesn't honor cancellation tokens when blocked at the socket level. Cancellation tokens only work at async checkpoint boundaries, not during OS-level socket waits.
Solution
Testing