Skip to content

guest/stdio: ConnSlot for container stdio survival across live migration#2721

Open
shreyanshjain7174 wants to merge 1 commit intomicrosoft:mainfrom
shreyanshjain7174:connslot-autoreconn-upstream
Open

guest/stdio: ConnSlot for container stdio survival across live migration#2721
shreyanshjain7174 wants to merge 1 commit intomicrosoft:mainfrom
shreyanshjain7174:connslot-autoreconn-upstream

Conversation

@shreyanshjain7174
Copy link
Copy Markdown
Contributor

What

ConnSlot is a transport.Connection wrapper that lets the underlying
vsock conn be replaced at runtime, so container stdio survives a bridge
disconnect during live migration.

Why

When the bridge died during LM, stdio relays were holding raw vsock fds
that the host had just torn down. The relays errored out, in-flight bytes
were lost, and azcrictl attach hung after migration.

How

  • ConnSlot.Read/Write delegate under a mutex; if the slot is empty
    they park on a sync.Cond. The relay parks before consuming the next
    byte from the upstream pipe, the kernel pipe (~64 KiB) fills, and the
    producer process blocks on its next write(2) — natural back-pressure
    with no user-space buffering.
  • Disconnect clears the conn and kicks runRedial in a goroutine.
    runRedial re-dials the same vsock port via a transport-agnostic
    redialer; on success it calls Set, which broadcasts the cond and
    wakes blocked relays against the new fd.
  • Bounded: 60 attempts × 100 ms ≈ 6 s wall-clock max so a permanently
    broken peer can't pin a goroutine.

Wiring

  • Container holds a one-method slotRegistry interface (narrow seam,
    not a *Host back-pointer). Container.Start, ExecProcess, and
    Host.runExternalProcess register their stdio set after
    stdio.Connect.
  • Host.stdioSlots is a slice guarded by containersMutex. Closed
    slots are compacted on every register call so the slice is bounded by
    the live-process count.
  • cmd/gcs/main.go calls h.DisconnectAllStdio() once per bridge
    disconnect cycle, after b.ListenAndServe returns.

Tests

Unit (go test -race -count=1, all 21 pass):

  • 15 ConnSlot tests in internal/guest/stdio/connslot_test.go
  • 6 Host registry tests in internal/guest/runtime/hcsv2/stdio_slots_test.go

End-to-end on the two-node LM bench:

  • Post-migration azcrictl exec produces a heartbeat counter that
    increments monotonically and arrives at the host (5+ counters
    captured on the verified runs).
  • Post-migration azcrictl attach connects, streams bytes, and exits
    cleanly when killed (was hanging before).

…gration

Wrap every container stdio vsock connection in a ConnSlot so the
underlying conn can be replaced when the bridge reconnects after live
migration.

When the bridge dies, cmd/gcs/main.go calls Host.DisconnectAllStdio,
which Disconnects every tracked slot. The relays (PipeRelay / TtyRelay)
park inside ConnSlot.acquire on a sync.Cond. The producing process keeps
writing into its kernel pipe (~64 KiB buffer) and back-pressures
naturally on its next write syscall when the buffer fills, so no bytes
are lost.

A background runRedial re-dials the same vsock port via a transport-
agnostic redialer callback. On success it calls Set, which broadcasts
the cond and wakes the parked relays against the fresh connection.
runRedial is bounded (60 attempts at 100 ms = ~6 s) so a permanently
broken peer cannot pin a goroutine forever.

Wiring:
  - Container holds a slotRegistry interface (one method, narrow seam),
    populated with the parent Host. Container.Start, ExecProcess, and
    Host.runExternalProcess register their stdio set after stdio.Connect.
  - Host.stdioSlots is a slice guarded by containersMutex; closed slots
    are compacted on every register call so the slice doesn't grow with
    container churn.
  - stdio.Connect uses tport.Dial (not DialReconn) so dial errors
    propagate to runRedial instead of being silently retried forever.

Coverage:
  - 15 ConnSlot unit tests (block/resume on Set, idempotent Disconnect /
    Close, Set-after-Close, redial bounded + Disconnect re-triggers,
    concurrent Disconnect with Writes under -race, pipe-relay back-
    pressure integration).
  - 6 Host registry unit tests (tracking, nil/non-slot ignore, nil set,
    DisconnectAll closes underlying conns, closed-slot compaction).
  - End-to-end PowerShell test on the two-node LM bench: post-migration
    azcrictl exec stdout streams a monotonic heartbeat counter to the
    host, and azcrictl attach is responsive (was hanging before).

Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
@shreyanshjain7174 shreyanshjain7174 requested a review from a team as a code owner May 6, 2026 13:57
@shreyanshjain7174 shreyanshjain7174 requested a review from rawahars May 6, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant