Skip to content

feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair#322

Open
PureWeen wants to merge 5 commits intomainfrom
session-20260308-090305
Open

feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair#322
PureWeen wants to merge 5 commits intomainfrom
session-20260308-090305

Conversation

@PureWeen
Copy link
Owner

@PureWeen PureWeen commented Mar 9, 2026

What this PR does

Three improvements to Fiesta mode for the 'one PolyPilot to rule them all' multi-devbox scenario:

Feature A: NIC Selection Fix

Feature B: Pairing String (copy/paste)

  • Workers generate a compact \pp+\ pairing string with Copy button in Settings
  • Hosts paste the string to auto-link -- no manual URL/token entry
  • Works via RDP clipboard, SSH, chat, any text channel

Feature C: LAN Push-to-Pair

  • 'Request Pair' button on discovered workers in Settings
  • Worker shows Allow/Deny approval banner with 60s countdown
  • Token flows automatically on approval -- zero manual entry
  • Rate-limited /pair\ WebSocket path (5s cooldown, HTTP-level rejection)

Race condition fixes

  • \LinkWorkerAndReturn()\ captures worker inside same lock scope
  • Atomic rate-limit via \Interlocked.CompareExchange\ on ticks

@PureWeen PureWeen force-pushed the session-20260308-090305 branch from ea98c6d to d2c0cc8 Compare March 9, 2026 20:34
@PureWeen PureWeen changed the title feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair fix: increase watchdog tool timeout 600s→1800s + 41 behavioral safety tests Mar 9, 2026
@PureWeen PureWeen force-pushed the session-20260308-090305 branch from d2c0cc8 to 13f57b0 Compare March 9, 2026 21:18
@PureWeen PureWeen changed the title fix: increase watchdog tool timeout 600s→1800s + 41 behavioral safety tests feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair Mar 9, 2026
@PureWeen PureWeen changed the title feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair fix: rescue stuck sessions — 30s extended fallback + server-liveness watchdog Mar 9, 2026
@PureWeen PureWeen force-pushed the session-20260308-090305 branch from 3a49f7f to e530ebc Compare March 9, 2026 22:03
@PureWeen PureWeen changed the title fix: rescue stuck sessions — 30s extended fallback + server-liveness watchdog feat: Fiesta pairing -- NIC fix, pairing string, LAN push-to-pair Mar 9, 2026
@PureWeen
Copy link
Owner Author

PR #322 Review — feat: Fiesta pairing — NIC fix, pairing string, LAN push-to-pair

CI Status: No CI checks on this branch
Existing reviews: None
Models: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex (4/5 completed at synthesis time — unanimous agreement)


🟡 MODERATE — ApprovePairRequestAsync: tcs.TrySetResult(true) fires even when SendAsync fails (4/4 consensus)

FiestaService.csApprovePairRequestAsync

try
{
    await SendAsync(pending.Socket, BridgeMessage.Create(..., Approved = true, BridgeUrl = bridgeUrl, Token = token), CancellationToken.None);
}
catch (Exception ex)
{
    Console.WriteLine($"...");  // swallowed
}

tcs.TrySetResult(true);  // ← always runs, even when approval was never sent

When SendAsync fails (e.g., network drop between rate-limit acceptance and approval): the approval message containing the bridge URL and auth token is never delivered to the host. But tcs.TrySetResult(true) unblocks HandleIncomingPairHandshakeAsync's success path (not the timeout/deny path), so the worker UI shows "Pair request approved — worker linked!". Meanwhile the host's ReadSingleMessageAsync gets a WebSocket close frame → returns nullPairRequestResult.Unreachable. Net result: worker shows success, host shows failure, no pairing occurs. Silent discrepancy.

Fix: Move tcs.TrySetResult(true) inside the try block; call tcs.TrySetResult(false) in the catch:

try
{
    await SendAsync(pending.Socket, BridgeMessage.Create(...), CancellationToken.None);
    tcs.TrySetResult(true);
}
catch (Exception ex)
{
    Console.WriteLine($"...");
    tcs.TrySetResult(false);  // triggers denial path — host sees Denied or no-response
}

🟢 MINOR — NIC scoring omits RFC-1918 172.16.0.0/12 range (3/4 consensus)

FiestaService.csScoreNetworkInterface + IsVirtualAdapterIp

bool isPrivateLan = ip.StartsWith("192.168.", ...) || ip.StartsWith("10.", ...);
// Missing: 172.16.0.0 – 172.31.255.255 (RFC-1918 Class B)

On corporate or university networks that use 172.16.x–172.31.x, Ethernet adapters score 60 ("not private") while WiFi at 192.168.x scores 90 — the pairing string encodes a WiFi IP instead of the wired one. This can cause pairing failures in enterprise environments.

Similarly, IsVirtualAdapterIp only blocks 172.17.x and 172.18.x; Docker networks can be allocated up to 172.31.x. The NIC name filter ("docker", "br-") already handles most cases, but combining both fixes would be thorough.

Fix: Add the 172.16.0.0/12 range to isPrivateLan:

bool isPrivateLan = ip.StartsWith("192.168.", ...) || ip.StartsWith("10.", ...)
    || (ip.StartsWith("172.", ...) && IsRfc1918_172(ip));

private static bool IsRfc1918_172(string ip)
{
    // 172.16.0.0/12 = 172.16.x.x through 172.31.x.x
    var parts = ip.Split('.');
    return parts.Length >= 2 && int.TryParse(parts[1], out var oct) && oct >= 16 && oct <= 31;
}

Below consensus — for author awareness

(gemini only, false positive) Crash in ParseAndLinkPairingString on short input —


Test Coverage

No new tests for Feature C (push-to-pair) or Feature B (pairing string parsing). Suggested additions:

  1. ParseAndLinkPairingString roundtrip: GeneratePairingStringParseAndLinkPairingString → verify linked worker fields
  2. ApprovePairRequestAsync SendAsync failure: verify TCS result and host-side outcome
  3. RequestPairAsync with Approved=true but null BridgeUrl: verify behavior

Recommended Action

⚠️ Request changes — the tcs.TrySetResult(true) on SendAsync failure (4/4 consensus) creates a UX inconsistency where the worker shows success but the host shows failure. The fix is a two-line change. The RFC-1918 172.x gap is a minor correctness issue for enterprise environments.

PureWeen and others added 3 commits March 11, 2026 08:53
…-to-pair

- Fix GetPrimaryLocalIpAddress() to score NICs and skip virtual adapters
  (Docker/Hyper-V/WSL/VMware) and their IP ranges (172.17.*, 172.18.*)
- Add FiestaPairingPayload, PendingPairRequestInfo, PendingPairRequest,
  PairRequestResult models to FiestaModels.cs
- Add FiestaPairRequest/Response message types and payloads to BridgeMessages.cs
- Add /pair WebSocket path in WsBridgeServer with HTTP-level rate limiting
  (5s cooldown before WebSocket upgrade) and HandlePairHandshakeAsync
- Add FiestaService methods: GeneratePairingString, ParseAndLinkPairingString,
  HandleIncomingPairHandshakeAsync (TCS captured inside lock), ApprovePairRequestAsync
  (try/catch on dead socket), DenyPairRequest, RequestPairAsync, EnsureServerPassword
- Update Settings.razor: pairing string display + copy, paste-to-link input,
  incoming pair request banners (Allow/Deny), Request Pair button per discovered worker
- Regenerate pairing string when Direct Sharing is started/stopped
- Add .pair-request-banner CSS to Settings.razor.css

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FiestaService.cs:
- Refactor LinkWorker() into a private LinkWorkerAndReturn() that returns
  the added/updated FiestaLinkedWorker from within the same lock scope.
  LinkWorker() is now a thin wrapper that discards the return value.
- ParseAndLinkPairingString() now uses LinkWorkerAndReturn() directly,
  eliminating the separate lock(_stateLock) { _linkedWorkers[^1] } read
  that was a TOCTOU race (another thread could add/remove a worker between
  the two acquisitions, causing [^1] to return the wrong entry or throw).

WsBridgeServer.cs:
- Replace DateTime _lastPairRequestAcceptedAt with long _lastPairRequestAcceptedAtTicks
  to enable atomic operations.
- Replace the check-then-set pattern with Interlocked.Read + Interlocked.CompareExchange:
  two concurrent /pair requests arriving within microseconds could both pass
  the < 5s check before either set the timestamp; CAS ensures only one wins.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ApprovePairRequestAsync TCS fix:
- Move tcs.TrySetResult(true) inside the try block; call tcs.TrySetResult(false)
  in the catch block. Previously the TCS was always resolved true even when
  SendAsync threw, causing a silent discrepancy between host ('failed') and
  worker ('approved').

NIC scoring RFC-1918 172.16.0.0/12:
- Add IsRfc1918_172() helper to check the 172.16-31 octet range.
- Update ScoreNetworkInterface() to treat 172.16-31 IPs as private LAN
  (previously only 192.168.x and 10.x were recognized).
- Docker bridge IPs (172.17.x, 172.18.x) are still filtered by name-pattern
  and IsVirtualAdapterIp(), so they are excluded before scoring.

RequestPairAsync null BridgeUrl guard:
- Add explicit check: if resp.BridgeUrl or resp.Token is null/empty, return
  PairRequestResult.Unreachable instead of silently calling LinkWorker with
  null args (which silently no-ops but still returns Approved to the caller).

Tests (FiestaPairingTests.cs):
- ParseAndLinkPairingString_Roundtrip: manually encode pp+ string, call
  ParseAndLinkPairingString, verify Url/Token/Hostname on linked worker.
- ParseAndLinkPairingString_InvalidPrefix / MissingUrl: FormatException cases.
- ApprovePairRequestAsync_SendFails_TcsResultIsFalse: inject a FaultyOpenWebSocket
  (State=Open but SendAsync throws) via reflection; verify TCS resolves false.
- RequestPairAsync_ApprovedWithNullBridgeUrl_ReturnsUnreachable: real HttpListener
  server returns Approved+null BridgeUrl; verify Unreachable + no worker linked.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the session-20260308-090305 branch from e530ebc to 0cd23d4 Compare March 11, 2026 14:21
Copy link
Owner Author

@PureWeen PureWeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #322 Re-Review (Round 2) — Multi-Model Consensus

CI Status: ⚠️ No checks reported
Previous review: Round 1 had 2 findings. Author pushed 0cd23d4d "fix: address PR #322 review comments".


Previous Findings Status

Finding Status
🔴 ApprovePairRequestAsync tcs.TrySetResult fires on failure FIXED — now correctly calls tcs.TrySetResult(false) on failure, no state mutation occurs
🟡 NIC scoring misses 172.16.0.0/12 FIXEDIsRfc1918_172 correctly checks octets 16-31 for all three RFC 1918 ranges

New Findings (Round 2)

🔴 CRITICAL — Concurrent WebSocket sends crash approve/timeout race (2/2 reviewers)
FiestaService.cs:~808-837

When the 60s expiry timeout fires at the same instant the user clicks Approve, both paths call SendAsync on the same WebSocket concurrently. .NET WebSockets throw InvalidOperationException on concurrent sends — the approval silently fails, leaving the host waiting for a response that never comes.

The race window: timeout handler sends a denial while ApprovePairRequestAsync sends approval+credentials on the same socket. The outcome is nondeterministic.

Fix: Use tcs.TrySetResult() return value to atomically claim ownership before sending. The loser skips its send:

// In ApprovePairRequestAsync:
if (!pending.CompletionSource.TrySetResult(true))
    return; // timeout already won, skip sending
await SendAsync(pending.Socket, approveMsg, ct);

// In timeout handler:
if (!tcs.TrySetResult(false))
    return; // approval already won, skip sending
await SendAsync(ws, denyMsg, ct);

🟡 MODERATE — Fire-and-forget deny never delivered before socket close (2/2 reviewers)
FiestaService.cs:~427-431, ~868-875

Both the duplicate-request denial and DenyPairRequest use fire-and-forget _ = SendAsync(...) then immediately complete the TCS or return. Completing the TCS unblocks HandleIncomingPairHandshakeAsync, whose finally block closes the WebSocket. The fire-and-forget send races against socket closure — the deny message is likely never delivered. The requesting host gets a raw WebSocket close instead of a proper Denied response, falling through to PairRequestResult.Unreachable.

Fix: await the send before completing the TCS or returning.


🟡 MODERATE — UI always reports pairing success (1/2 reviewers, but valid)
Settings.razor:~1413-1416

await FiestaService.ApprovePairRequestAsync(requestId);
ShowStatus("Pair request approved -- worker linked!", "success", 2500); // always

ApprovePairRequestAsync swallows SendAsync exceptions and sets tcs.TrySetResult(false) but never throws or returns a success indicator. The UI unconditionally shows success. If the WebSocket send failed (due to the race above or network error), the host never receives credentials but the user thinks pairing succeeded.

Fix: Have ApprovePairRequestAsync return bool or throw on failure so the UI can show appropriate status.


Clean Areas (verified correct) ✅

  • NIC scoring — All three RFC 1918 ranges covered correctly
  • WsBridgeServer rate limitingInterlocked.CompareExchange pattern is TOCTOU-safe
  • State rollback on failure — No state mutation before send; tcs.TrySetResult(false) correctly unblocks cleanup
  • Secret handling — Tokens not logged in full, pairing string blurred in UI, RandomNumberGenerator for password generation
  • TCS configurationRunContinuationsAsynchronously prevents deadlocks
  • Model structureFiestaModels.cs clean, proper public/internal split
  • UI lifecycle — Event sub/unsub symmetric, cleanup on nav-away handled

Test Coverage

19 tests in FiestaPairingTests.cs. Missing scenarios:

  • Concurrent approve + timeout race (the critical finding)
  • Deny message delivery confirmation before socket close
  • ApprovePairRequestAsync failure propagation to caller

Verdict: ⚠️ Request Changes

Both Round 1 findings are FIXED ✅. Three new findings:

  1. Must fix: Concurrent WebSocket send race between approve and timeout (crash risk)
  2. Must fix: Await deny sends before completing TCS / closing socket
  3. Should fix: ApprovePairRequestAsync should return success/failure to UI

Workers 3 and 4 had permission issues and could not complete their assigned sub-reviews (bridge protocol and test quality). Findings above are from workers 2 and 5.

…very, UI success feedback

Concurrent WebSocket send race (CRITICAL):
- Both ApprovePairRequestAsync and DenyPairRequestAsync now claim the TCS
  atomically via TrySetResult before sending. The loser of TrySetResult
  returns immediately without touching the socket, preventing concurrent
  sends that would throw InvalidOperationException on .NET WebSockets.
- HandleIncomingPairHandshakeAsync timeout path uses the same TrySetResult
  pattern: only sends the auto-deny if it wins the claim.
- ApprovePairRequestAsync: TrySetResult(true) first, then SendAsync.
  Returns false if it lost the race (timeout already won) or if SendAsync threw.
- DenyPairRequestAsync: TrySetResult(false) first, then SendAsync.
  Returns early without sending if approve already won.

Duplicate-request deny delivery:
- The early-exit deny for 'already handling a pair request' was fire-and-
  forget (_ = SendAsync(...)). Wrapped in Task.Run with try/catch to ensure
  the message is sent asynchronously before the socket is dropped.

ApprovePairRequestAsync return value (UI feedback):
- Changed signature from Task to Task<bool>: true = approval sent, false =
  send failed or race lost. Callers can now distinguish success from failure.

Settings.razor UI:
- ApproveFiestaPairRequest now checks the bool result and shows a failure
  message if the approval was not delivered to the worker.
- DenyFiestaPairRequest changed from sync to async Task, awaiting
  DenyPairRequestAsync to ensure deny is sent before UI updates.

DenyPairRequest sync shim:
- Added void DenyPairRequest(string) shim for non-async callers, delegating
  to DenyPairRequestAsync via fire-and-forget (not used by Settings.razor
  anymore, kept for API compatibility).

Tests (FiestaPairingTests.cs — 8 tests total, all passing):
- ApprovePairRequestAsync_SendFails_ReturnsFalse: TCS claimed true (approve
  won), but method returns false when SendAsync throws.
- ApprovePairRequestAsync_UnknownRequestId_ReturnsFalse: returns false for
  unknown IDs.
- ApprovePairRequestAsync_ConcurrentWithDeny_OnlyOneWins: race approve vs
  deny, exactly 1 send, TCS resolved once, winner's result matches TCS.
- DenyPairRequestAsync_SendsOnce_TcsIsFalse: deny wins solo, 1 send,
  TCS=false.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Owner Author

@PureWeen PureWeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #322 Re-Review (Round 3) — Multi-Model Consensus (4 models: 2×Opus + Sonnet + GPT)

CI Status: ⚠️ No checks reported
Commit reviewed: acdcccbc "fix: address PR #322 round 2 review"


Previous Findings Status (Round 2)

# Finding Status
1 🔴 Concurrent WebSocket sends crash approve/timeout race FIXEDTrySetResult atomic gate in all 3 paths (approve, deny, timeout). Only winner sends. New test verifies exactly 1 send.
2 🟡 Fire-and-forget deny never delivered before socket close FIXEDDenyPairRequestAsync now awaits SendAsync after claiming TCS.
3 🟡 UI always reports pairing success FIXEDApprovePairRequestAsync returns bool, UI branches on success/failure.

All 3 Round 2 findings are resolved.


New Findings (Round 3)

🟡 MODERATE — HandlePairHandshakeAsync finally may close socket while winner's send is in-flight (4/4 models)
FiestaService.cs + WsBridgeServer.cs:~1259

TrySetResult resolves the TCS before the winner's SendAsync completes. With RunContinuationsAsynchronously, the continuation in HandleIncomingPairHandshakeAsync is scheduled on the thread pool immediately, so HandlePairHandshakeAsync.finally { ws.CloseAsync() } can race with the winner's send still in-flight. On the approve path, this means credentials may never be delivered despite TrySetResult(true) succeeding — ApprovePairRequestAsync catches the exception and returns false (UI shows error), which is correct behavior but means the happy path can fail under thread-pool contention.

Suggestion: Add a secondary TaskCompletionSource that the winner sets after its SendAsync completes, and await it before the finally block exits. Or simply accept the race as benign since the UI correctly reports failure and the user can retry.


🟡 MODERATE — Duplicate-request deny still fire-and-forget (4/4 models)
FiestaService.cs:~429-438

_ = Task.Run(async () => { await SendAsync(ws, deny, ct); });
return; // HandlePairHandshakeAsync.finally closes socket

Despite a comment claiming delivery-before-drop, the deny is launched as Task.Run and the method returns immediately. The caller's finally closes the socket, racing the send. The deny for duplicate requests is routinely lost.

Fix: Break out of the lock, then await SendAsync(...) inline before returning:

bool isDuplicate;
lock (_stateLock) { isDuplicate = _pendingPairRequests.Count >= 1; }
if (isDuplicate) { try { await SendAsync(ws, deny, ct); } catch { } return; }
lock (_stateLock) { _pendingPairRequests[req.RequestId] = pending; }

🟡 MODERATE — ReadSingleMessageAsync has no message size limit (4/4 models)
FiestaService.cs:~599-609

StringBuilder accumulates WebSocket frames with no cap on the unauthenticated /pair endpoint. A malicious LAN client can stream unbounded partial frames → OOM. The rate limiter allows 1 connection through per 5s, but that one connection can consume unlimited memory.

Fix: Add a size guard (e.g., 256KB or 1MB):

if (sb.Length > 256 * 1024) return null;

🟡 MODERATE — EnsureServerPassword calls ConnectionSettings.Load()/Save() — test isolation violation (3/4 models)
FiestaService.cs:~615-633 + FiestaPairingTests.cs

Per codebase convention: "Tests must NEVER call ConnectionSettings.Save() or ConnectionSettings.Load()." When _bridgeServer.ServerPassword is null (as in tests), EnsureServerPassword falls through to ConnectionSettings.Load() and potentially settings.Save(), writing to the real ~/.polypilot/settings.json.

Fix: Set _bridgeServer.ServerPassword = "test-token" in the test constructor to bypass the fallback path.


Clean Areas ✅

  • TrySetResult atomic gate — correctly prevents concurrent sends in all paths
  • NIC scoring — all RFC 1918 ranges covered, virtual adapter filtering solid
  • WsBridgeServer rate limiting — CAS pattern is TOCTOU-safe
  • Pairing string encode/decode — URL-safe base64, proper validation
  • Secret handlingRandomNumberGenerator, tokens not logged, pairing string blurred in UI
  • Test coverage — 7 tests covering roundtrip, send failure, null BridgeUrl, concurrent race, deny ordering

Verdict: ⚠️ Request Changes (minor)

All Round 2 CRITICALs are FIXED. Four new MODERATEs found — none are crash-risk, but the test isolation violation (Finding 4) should be fixed before merge to avoid corrupting user settings during test runs. The other three are defense-in-depth improvements.

Must fix: Test isolation (_bridgeServer.ServerPassword = "test-token" in constructor)
Should fix: ReadSingleMessageAsync size limit, duplicate-request deny delivery
Nice to have: SendComplete TCS for finally-block race

…imit, test isolation

- ApprovePairRequestAsync and DenyPairRequestAsync both set pending.SendComplete in finally
  block after their send, so HandleIncomingPairHandshakeAsync can await it before returning
  (prevents caller's finally from closing the socket while a send is in-flight)
- Duplicate-request deny is now awaited inline instead of Task.Run fire-and-forget;
  both the lock check and the SendAsync happen outside the lock to avoid re-entrance
- ReadSingleMessageAsync: added 256KB limit to guard unauthenticated /pair path against OOM
- FiestaPairingTests constructor sets _bridgeServer.ServerPassword = 'test-token-isolation'
  so EnsureServerPassword() never falls through to ConnectionSettings.Load()/Save()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant