Skip to content

Fix Docker scan timeout cancellation not working#1729

Open
AMaini503 wants to merge 8 commits intomainfrom
fix/docker-scan-timeout-cancellation
Open

Fix Docker scan timeout cancellation not working#1729
AMaini503 wants to merge 8 commits intomainfrom
fix/docker-scan-timeout-cancellation

Conversation

@AMaini503
Copy link
Contributor

Problem

The cancellation token passed to \CreateAndRunContainerAsync\ was not being honored when blocked on \ReadOutputToEndAsync, causing Docker image scans to hang indefinitely despite the 10-minute timeout.

Root Cause

\MultiplexedStream.ReadOutputToEndAsync\ from Docker.DotNet doesn't honor cancellation tokens when blocked at the socket level. Cancellation tokens only work at async checkpoint boundaries, not during OS-level socket waits.

Solution

  • Add \ReadContainerOutputAsync\ with \Task.WhenAny\ pattern to race the read against a cancellation-aware delay
  • Kill the container when cancelled to unblock the socket-level read
  • Add \DockerServiceStepTelemetryRecord\ for per-step telemetry to identify which operation hangs

Testing

  • Build passes
  • Existing tests pass (159/167, the 1 failure is pre-existing env issue - Docker not running locally)
  • \SerializableProperties\ test validates new telemetry record

The cancellation token passed to CreateAndRunContainerAsync was not being honored when blocked on ReadOutputToEndAsync, causing scans to hang indefinitely despite the 10-minute timeout.

Changes:

- Add ReadContainerOutputAsync with Task.WhenAny pattern to race read against cancellation

- Kill container when cancelled to unblock socket-level read

- Add DockerServiceStepTelemetryRecord for per-step telemetry to identify hung operations
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Docker image scan hangs by ensuring cancellation is honored even when Docker.DotNet’s MultiplexedStream.ReadOutputToEndAsync is blocked at the socket level, and adds more granular telemetry to pinpoint which Docker step is hanging.

Changes:

  • Add a cancellation-safe container output read (Task.WhenAny vs. cancellation) and attempt to unblock reads by removing the container on cancellation.
  • Introduce per-step Docker service telemetry records to identify which Docker operation is slow/hung.
  • Minor refactor to avoid re-serializing the command string in the summary telemetry record.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/Microsoft.ComponentDetection.Common/Telemetry/Records/DockerServiceStepTelemetryRecord.cs Adds a new telemetry record type for per-step Docker service operations.
src/Microsoft.ComponentDetection.Common/DockerService.cs Implements cancellation-aware output reading and emits per-step telemetry for container lifecycle operations.
Comments suppressed due to low confidence (4)

src/Microsoft.ComponentDetection.Common/DockerService.cs:218

  • In the cancellation path, this method removes the container using CancellationToken.None. If RemoveContainerAsync hangs (e.g., daemon/socket issues), CreateAndRunContainerAsync can still block indefinitely even though the caller’s cancellation was requested. Consider using a separate bounded timeout token for cleanup (e.g., a short CancellationTokenSource) and ensuring cleanup failures don’t prevent propagating the OperationCanceledException.
        catch (OperationCanceledException)
        {
            // Ensure container is removed on cancellation
            await RemoveContainerAsync(container.ID, CancellationToken.None);
            throw;

src/Microsoft.ComponentDetection.Common/DockerService.cs:218

  • ReadContainerOutputAsync removes the container when cancellation wins; the outer catch(OperationCanceledException) also removes the same container. This can result in duplicate RemoveContainer calls and may throw (e.g., 404/not found) and mask the original cancellation. Consider centralizing container cleanup in a single finally block and making removal idempotent (ignore not-found).
        catch (OperationCanceledException)
        {
            // Ensure container is removed on cancellation
            await RemoveContainerAsync(container.ID, CancellationToken.None);
            throw;

src/Microsoft.ComponentDetection.Common/DockerService.cs:294

  • DockerServiceStepTelemetryRecord includes ExceptionMessage, but if CreateContainerAsync fails, the step record will be emitted without the error details. Consider wrapping the CreateContainerAsync call in try/catch to set record.ExceptionMessage before rethrowing.

        var response = await Client.Containers.CreateContainerAsync(parameters, cancellationToken);
        record.ContainerId = response.ID;
        return response;

src/Microsoft.ComponentDetection.Common/DockerService.cs:303

  • DockerServiceStepTelemetryRecord includes ExceptionMessage, but if AttachContainerAsync fails, the step record will be emitted without the error details. Consider wrapping the Docker client call in try/catch to set record.ExceptionMessage before rethrowing.
        using var record = new DockerServiceStepTelemetryRecord
        {
            Step = "AttachContainer",
            ContainerId = containerId,
        };

You can also share your feedback on Copilot code review. Take the survey.

- Observe readTask without blocking using ContinueWith

- Make RemoveContainerAsync idempotent (ignore DockerContainerNotFoundException)

- Capture all exceptions in step telemetry records
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.0%. Comparing base (31c5047) to head (a75bdaa).

Additional details and impacted files
@@     Coverage Diff      @@
##   main   #1729   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings March 19, 2026 23:48
@AMaini503
Copy link
Contributor Author

Addressed @JamieMagee's feedback - using stream.Dispose() instead of the ContinueWith pattern. Much cleaner!

@github-actions
Copy link

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

  • The detector detects more or fewer components than before
  • The detector generates different parent/child graph relationships than before
  • The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Docker image scans hanging past the configured timeout by ensuring cancellation is honored even when Docker.DotNet output reads are blocked at the socket level, and adds step-level telemetry to identify which Docker operation is hanging.

Changes:

  • Adds a cancellation-safe container output read (ReadContainerOutputAsync) using a Task.WhenAny race against cancellation.
  • Ensures cleanup on cancellation by disposing the stream and forcing container removal.
  • Introduces DockerServiceStepTelemetryRecord to emit telemetry per Docker operation step.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Microsoft.ComponentDetection.Common/DockerService.cs Implements cancellation-safe output reading and emits per-step telemetry around Docker operations.
src/Microsoft.ComponentDetection.Common/Telemetry/Records/DockerServiceStepTelemetryRecord.cs Adds a new telemetry record type to capture per-step Docker operation details (step name, container ID, cancellation, error message).
Comments suppressed due to low confidence (4)

src/Microsoft.ComponentDetection.Common/DockerService.cs:258

  • ReadContainerOutputAsync returns await readTask without capturing failures in the step telemetry. If ReadOutputToEndAsync throws, DockerServiceStepTelemetryRecord.ExceptionMessage remains unset for the ReadOutput step. Wrap the final await in try/catch to set record.ExceptionMessage before rethrowing.
        return await readTask;

src/Microsoft.ComponentDetection.Common/DockerService.cs:206

  • MultiplexedStream returned from AttachContainerAsync is not disposed on the successful path. Since it wraps the underlying HTTP/socket stream, this can leak connections until GC/finalization. Consider disposing it deterministically (e.g., await using/using around the stream, or disposing in a finally after output read/removal).
            var stream = await AttachContainerAsync(container.ID, cancellationToken);
            await StartContainerAsync(container.ID, cancellationToken);

            var (stdout, stderr) = await ReadContainerOutputAsync(stream, container.ID, image, cancellationToken);

src/Microsoft.ComponentDetection.Common/DockerService.cs:218

  • CreateAndRunContainerAsync only guarantees container cleanup on the success path and when an OperationCanceledException is thrown. If AttachContainerAsync, StartContainerAsync, or the output read throws a non-cancellation exception, the created container can be left behind. Consider using a finally block (best-effort, non-throwing) to remove the container for all failure cases, and avoid redundant double-removal on cancellation since ReadContainerOutputAsync already removes the container when cancellation wins.
        catch (OperationCanceledException)
        {
            // Ensure container is removed on cancellation
            await RemoveContainerAsync(container.ID, CancellationToken.None);
            throw;

src/Microsoft.ComponentDetection.Common/DockerService.cs:246

  • The new cancellation behavior in ReadContainerOutputAsync (racing the read against a cancellation-aware task and performing cleanup) isn’t covered by existing tests (e.g., no test asserts that CreateAndRunContainerAsync cancels promptly when output read blocks). Consider adding a unit/integration test that exercises the cancellation path so regressions don’t reintroduce hangs.
        var readTask = stream.ReadOutputToEndAsync(CancellationToken.None);
        var delayTask = Task.Delay(Timeout.Infinite, cancellationToken);

        var completedTask = await Task.WhenAny(readTask, delayTask);

        if (completedTask == delayTask)
        {

Copilot AI review requested due to automatic review settings March 20, 2026 00:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/Microsoft.ComponentDetection.Common/DockerService.cs:264

  • ReadContainerOutputAsync disposes the stream and then awaits RemoveContainerAsync with CancellationToken.None while the caller is already cancelled. If Docker becomes unresponsive, this cleanup call can hang indefinitely and/or throw a different exception that masks the original cancellation. Consider making cancellation cleanup strictly best-effort (e.g., bounded timeout token + swallow/log failures) and then rethrow the OperationCanceledException.
            // Remove the container for cleanup
            await RemoveContainerAsync(containerId, CancellationToken.None);

            cancellationToken.ThrowIfCancellationRequested();
        }

src/Microsoft.ComponentDetection.Common/DockerService.cs:206

  • The MultiplexedStream returned by AttachContainerAsync is not disposed on the success path (it's only disposed in the cancellation branch of ReadContainerOutputAsync). This can leave sockets/streams open longer than necessary. Consider disposing the stream in a finally/using scope in CreateAndRunContainerAsync (and/or within ReadContainerOutputAsync) so it is always released on success and on non-cancellation failures.
            var stream = await AttachContainerAsync(container.ID, cancellationToken);
            await StartContainerAsync(container.ID, cancellationToken);

            var (stdout, stderr) = await ReadContainerOutputAsync(stream, container.ID, image, cancellationToken);

@AMaini503
Copy link
Contributor Author

Re: finally block suggestion at line 218 - done! Moved cleanup to \ inally\ and removed redundant removal from \ReadContainerOutputAsync. \RemoveContainerAsync\ is already idempotent (catches \DockerContainerNotFoundException).

Copilot AI review requested due to automatic review settings March 20, 2026 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

finally
{
// Best-effort container cleanup; RemoveContainerAsync already handles not-found.
await RemoveContainerAsync(container.ID, CancellationToken.None);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finally block always awaits RemoveContainerAsync with CancellationToken.None, so CreateAndRunContainerAsync may still hang indefinitely during cleanup even when the caller’s cancellation/timeout has fired. Consider using a bounded cleanup timeout (e.g., separate CTS with short CancelAfter), or honoring the caller token and swallowing OperationCanceledException during cleanup so cancellation can return promptly.

Suggested change
await RemoveContainerAsync(container.ID, CancellationToken.None);
// Use a bounded, cancellation-aware token so that cleanup does not hang indefinitely.
using var cleanupCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cleanupCts.CancelAfter(TimeSpan.FromSeconds(5));
try
{
await RemoveContainerAsync(container.ID, cleanupCts.Token);
}
catch (OperationCanceledException) when (cleanupCts.IsCancellationRequested)
{
// Swallow cancellation during best-effort cleanup to allow the caller's
// cancellation or timeout to propagate promptly.
}

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +252
// Observe the readTask to prevent unobserved task exceptions.
// Running any continuation automatically marks the exception as observed.
_ = readTask.ContinueWith(
static _ => { },
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continuation attached to readTask doesn’t actually observe the exception because it never reads t.Exception (or awaits the task). This can still allow UnobservedTaskException to surface later. Update the continuation to explicitly access/handle the exception (or otherwise observe it) instead of an empty delegate, and adjust the comment accordingly.

Suggested change
// Observe the readTask to prevent unobserved task exceptions.
// Running any continuation automatically marks the exception as observed.
_ = readTask.ContinueWith(
static _ => { },
// Observe the readTask to prevent unobserved task exceptions by
// explicitly accessing the Exception property when the task faults.
_ = readTask.ContinueWith(
static t => { _ = t.Exception; },

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +261
var readTask = stream.ReadOutputToEndAsync(CancellationToken.None);
var delayTask = Task.Delay(Timeout.Infinite, cancellationToken);

var completedTask = await Task.WhenAny(readTask, delayTask);

if (completedTask == delayTask)
{
record.WasCancelled = true;

// Dispose the stream to unblock any pending read operation
stream.Dispose();

// Observe the readTask to prevent unobserved task exceptions.
// Running any continuation automatically marks the exception as observed.
_ = readTask.ContinueWith(
static _ => { },
CancellationToken.None,
TaskContinuationOptions.OnlyOnFaulted,
TaskScheduler.Default);

// Caller is responsible for container cleanup via finally block
cancellationToken.ThrowIfCancellationRequested();
}

return await readTask;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadContainerOutputAsync doesn’t capture failures into the step telemetry record: if ReadOutputToEndAsync faults, the exception will propagate but record.ExceptionMessage remains unset (unlike the other step wrappers). Consider wrapping the read/await in try/catch to set record.ExceptionMessage before rethrowing, so the emitted step record contains the error context.

Suggested change
var readTask = stream.ReadOutputToEndAsync(CancellationToken.None);
var delayTask = Task.Delay(Timeout.Infinite, cancellationToken);
var completedTask = await Task.WhenAny(readTask, delayTask);
if (completedTask == delayTask)
{
record.WasCancelled = true;
// Dispose the stream to unblock any pending read operation
stream.Dispose();
// Observe the readTask to prevent unobserved task exceptions.
// Running any continuation automatically marks the exception as observed.
_ = readTask.ContinueWith(
static _ => { },
CancellationToken.None,
TaskContinuationOptions.OnlyOnFaulted,
TaskScheduler.Default);
// Caller is responsible for container cleanup via finally block
cancellationToken.ThrowIfCancellationRequested();
}
return await readTask;
try
{
var readTask = stream.ReadOutputToEndAsync(CancellationToken.None);
var delayTask = Task.Delay(Timeout.Infinite, cancellationToken);
var completedTask = await Task.WhenAny(readTask, delayTask);
if (completedTask == delayTask)
{
record.WasCancelled = true;
// Dispose the stream to unblock any pending read operation
stream.Dispose();
// Observe the readTask to prevent unobserved task exceptions.
// Running any continuation automatically marks the exception as observed.
_ = readTask.ContinueWith(
static _ => { },
CancellationToken.None,
TaskContinuationOptions.OnlyOnFaulted,
TaskScheduler.Default);
// Caller is responsible for container cleanup via finally block
cancellationToken.ThrowIfCancellationRequested();
}
return await readTask;
}
catch (Exception ex)
{
record.ExceptionMessage = ex.Message;
throw;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants