Skip to content

fix(k8s): thread caller ctx into DialContext exec goroutine#3772

Open
SarthakB11 wants to merge 1 commit into
knative:mainfrom
SarthakB11:sarth/k8s-dialer-ctx
Open

fix(k8s): thread caller ctx into DialContext exec goroutine#3772
SarthakB11 wants to merge 1 commit into
knative:mainfrom
SarthakB11:sarth/k8s-dialer-ctx

Conversation

@SarthakB11
Copy link
Copy Markdown

Changes

DialContext receives ctx as a parameter and the outer select watches <-ctx.Done() to return early when the caller cancels. The exec goroutine spawned at line 80, however, called c.exec(context.TODO(), ...) rather than passing ctx. On caller cancellation, DialContext returns immediately, but the goroutine keeps running and the exec stream against kube-apiserver stays open for the lifetime of the parent process.

Pass ctx to c.exec. The outer select still races connectFailure against ctx.Done() and conn.closeWithError is idempotent across both paths.

Dial at line 323 is unchanged. That attach goroutine is intentionally long-lived (held open by emptyBlockingReader waiting on the detach signal) and must outlive the Dial call.

/kind bug

fix: k8s dialer exec goroutine now exits on caller cancellation

The exec goroutine inside DialContext was started with context.TODO()
even though DialContext receives ctx as a parameter and the outer
select watches <-ctx.Done(). When the caller cancels ctx, DialContext
returns immediately on the Done case, but the spawned goroutine keeps
running because c.exec was given an empty context. The exec stream
against kube-apiserver stays open for the lifetime of the parent
process.

Pass ctx to c.exec so cancellation propagates and the goroutine exits
cleanly. closeWithError is idempotent so the existing race-free path
through connectFailure / ctx.Done remains intact.

Line 323 in Dial is unchanged: the attach goroutine there is
intentionally long-lived (held open by emptyBlockingReader on the
detach signal) and must outlive Dial.

Signed-off-by: SarthakB11 <sarthak.bhardwaj21b@iiitg.ac.in>
@knative-prow knative-prow Bot added the kind/bug Bugs label May 17, 2026
@knative-prow knative-prow Bot requested review from dsimansk and jrangelramos May 17, 2026 05:51
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SarthakB11
Once this PR has been reviewed and has the lgtm label, please assign jrangelramos for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 17, 2026

Hi @SarthakB11. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow Bot added size/XS 🤖 PR changes 0-9 lines, ignoring generated files. needs-ok-to-test 🤖 Needs an org member to approve testing labels May 17, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

/ok-to-test

@knative-prow knative-prow Bot added ok-to-test 🤖 Non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test 🤖 Needs an org member to approve testing labels May 17, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

https://pkg.go.dev/net#Dialer.DialContext

The provided Context must be non-nil. If the context expires before the connection is complete, an error is returned. Once successfully connected, any expiration of the context will not affect the connection.

Copy link
Copy Markdown
Contributor

@matejvasek matejvasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use that context.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.03%. Comparing base (584c11e) to head (adede99).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3772   +/-   ##
=======================================
  Coverage   57.03%   57.03%           
=======================================
  Files         182      182           
  Lines       21376    21376           
=======================================
  Hits        12191    12191           
  Misses       7953     7953           
  Partials     1232     1232           
Flag Coverage Δ
e2e 35.81% <100.00%> (ø)
e2e go 31.41% <0.00%> (ø)
e2e node 27.25% <0.00%> (ø)
e2e python 31.78% <0.00%> (ø)
e2e quarkus 27.37% <0.00%> (ø)
e2e rust 26.78% <0.00%> (+0.01%) ⬆️
e2e springboot 25.31% <0.00%> (ø)
e2e typescript 27.35% <0.00%> (ø)
e2e-config-ci 28.16% <0.00%> (ø)
integration 17.17% <100.00%> (ø)
unit macos-14 45.09% <0.00%> (ø)
unit macos-latest 45.09% <0.00%> (ø)
unit ubuntu-24.04-arm 45.33% <0.00%> (ø)
unit ubuntu-latest 46.04% <0.00%> (ø)
unit windows-latest 45.14% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@matejvasek
Copy link
Copy Markdown
Contributor

matejvasek commented May 17, 2026

On caller cancellation, DialContext returns immediately, but the goroutine keeps running and the exec stream against kube-apiserver stays open for the lifetime of the parent process.

Not really as we close the connection. You might replace context.TODO() with different context, but definitely not with the ctx parameter of the DialContext() function. You need to create new context for this.

@matejvasek matejvasek requested review from gauron99 and lkingland and removed request for dsimansk and jrangelramos May 17, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Bugs ok-to-test 🤖 Non-member PR verified by an org member that is safe to test. size/XS 🤖 PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants