Skip to content

Improve bridge startup reliability and token validation#341

Open
PureWeen wants to merge 3 commits intomainfrom
fix/bridge-hardening
Open

Improve bridge startup reliability and token validation#341
PureWeen wants to merge 3 commits intomainfrom
fix/bridge-hardening

Conversation

@PureWeen
Copy link
Owner

Changes

WsBridgeServer

  • Removed unconditional loopback trust when an access token is configured. Previously, any local process could connect without a token. Now the token is required when set; unauthenticated local connections are only permitted when no token is configured (local-only mode).
  • Replaced plain string comparison for token validation with a constant-time comparison using `CryptographicOperations.FixedTimeEquals`.

DevTunnelService

  • The bridge socket was previously open without authentication during the ~30-60 second window while the tunnel handshake runs. Now a random ephemeral token is set on the bridge before `Start()`, so the listening socket always requires auth from the moment it opens. The real access token replaces it once issued.

PluginLoader

  • Plugin integrity verification previously only covered the entry-point DLL. A replacement dependency assembly would pass the hash check. `ComputeDirectoryHash` now hashes all `.dll` files in the plugin directory (sorted by filename, filename bytes included) so any substitution or addition is detected.
  • Note: This changes the hash scheme. Existing approved plugins will show as needing re-approval on first run after this update.

ConnectionSettings

  • On iOS, Android, and Mac Catalyst, `RemoteToken`, `LanToken`, and `ServerPassword` are now stored in the platform SecureStorage (Keychain on Apple) rather than plain JSON. On first load, any values already in `settings.json` are migrated to SecureStorage and scrubbed from the file.
  • Desktop (non-catalyst) behaviour is unchanged.

Tests

  • Updated `WsBridgeIntegrationTests` to match the new auth behaviour: added a test asserting loopback-without-token is rejected when a token is configured, and a test confirming loopback-with-correct-token still works.

@PureWeen
Copy link
Owner Author

PR #341 — 5-Model Consensus Review

Title: Improve bridge startup reliability and token validation
CI: ⚠️ No checks configured on fix/bridge-hardening
Prior reviews: None
Models: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex


🔴 CRITICAL — DevTunnel loopback removal breaks all mobile connections (2/5 models)

PolyPilot/Services/WsBridgeServer.csValidateClientToken

The PR removes the loopback bypass and simultaneously sets a random ephemeral token before _bridge.Start(). DevTunnel proxies remote connections to localhost — they arrive as loopback requests. The project docs explicitly note DevTunnel strips X-Tunnel-Authorization before proxying. With no loopback bypass and an always-set AccessToken, DevTunnel-proxied connections carry no token → ValidateClientToken returns false → all remote mobile clients are rejected.

Before this PR, the flow was: remote client → DevTunnel → localhost → loopback bypass → accepted. After: remote client → DevTunnel (strips header) → localhost → no loopback bypass, no token → rejected.

Fix: The random guard token covers the window before the real token is known. Once the real token is set, DevTunnel connections carry it. But the loopback bypass was load-bearing for DevTunnel-proxied connections whose token header gets stripped. Either restore the loopback bypass for DevTunnel connections specifically, or confirm the header is preserved by the DevTunnel infrastructure for this deployment.


🔴 CRITICAL — Silent secret loss during migration (4/5 models)

PolyPilot/Models/ConnectionSettings.csMigrateAndLoadMobileSecrets

if (string.IsNullOrEmpty(storedRemote) && !string.IsNullOrEmpty(legacyRemote))
{ SetSecureStorageSync(RemoteTokenKey, legacyRemote); storedRemote = legacyRemote; }
// ...
_secretsDirty = false;
if (hadLegacySecrets)
    Save(); // scrubs JSON — [JsonIgnore] means RemoteToken won't be serialized

SetSecureStorageSync swallows all exceptions internally (catch { }). If the Keychain/EncryptedSharedPreferences write fails (locked keychain, missing entitlements, storage full), storedRemote = legacyRemote still sets the in-memory value, _secretsDirty = false is set, and Save() writes JSON with secrets omitted ([JsonIgnore]). On the next app restart, the secret is absent from both Keychain (write failed) and JSON (scrubbed) — permanently lost.

Fix: Only call Save() to scrub JSON after confirming each SecureStorage write succeeded (e.g., verify by reading back, or return a success bool from SetSecureStorageSync).


🔴 CRITICAL — Deadlock on iOS main thread (4/5 models)

PolyPilot/Models/ConnectionSettings.csGetSecureStorageSync / SetSecureStorageSync

private static string? GetSecureStorageSync(string key)
{
    try { return Task.Run(() => SecureStorage.Default.GetAsync(key)).GetAwaiter().GetResult(); }
    catch { return null; }
}

ConnectionSettings.Load() is called during MAUI app startup on the main thread. On iOS, SecureStorage.GetAsync accesses Keychain, which in some MAUI implementations requires marshaling back to the main thread. Pattern: main thread blocks on GetResult() → thread pool thread calls GetAsync → Keychain marshals back to main thread → deadlock. App hangs on launch.

Fix: Make Load() async and await SecureStorage calls, or check whether MAUI's iOS SecureStorage implementation needs the main thread (it uses Security.SecKeyChain which is thread-safe on modern iOS — but verify).


🟡 MODERATE — TokenEquals doesn't achieve constant-time on length mismatch (3/5 models)

PolyPilot/Services/WsBridgeServer.csTokenEquals

if (a.Length != b.Length)
{
    CryptographicOperations.FixedTimeEquals(a, a); // O(|a|), not O(|b|)
    return false;
}
return CryptographicOperations.FixedTimeEquals(a, b); // O(|b|)

The integer comparison a.Length != b.Length itself is a timing oracle (returns immediately vs. runs full comparison). The dummy FixedTimeEquals(a, a) runs in O(|provided|) time, not O(|expected|) time — timing still reveals whether the provided token length matched the expected length. For the random base64 AccessToken (fixed 44 chars) this is low risk, but ServerPassword is user-defined and variable-length.

Fix: HMAC both inputs against a random key and compare the fixed-length HMACs, or pad both byte arrays to Math.Max(a.Length, b.Length) before comparing.


🟡 MODERATE — Plugin hash scope mismatch between discovery and load (3/5 models)

PolyPilot/Services/PluginLoader.cs:~171

  • DiscoverPlugins: ComputeDirectoryHash(dir) — always hashes the plugin root directory ✅
  • LoadEnabledProviders: ComputeDirectoryHash(Path.GetDirectoryName(fullPath) ?? pluginDir) — when plugin.EntryDll contains a subdirectory path (e.g., bin/Release/plugin.dll), Path.GetDirectoryName returns pluginDir/bin/Release/, not pluginDir/

The discovery hash covers all DLLs in the root. The load-time hash covers only a subdirectory. These will never match → plugin is permanently blocked from loading after first approval.

Fix: Use pluginDir directly in LoadEnabledProviders instead of re-deriving from the entry DLL path.


🟢 MINOR — Test cleanup not in finally (3/5 models)

PolyPilot.Tests/WsBridgeIntegrationTests.cs:~998

// Both new tests:
_server.AccessToken = "test-secret-token-12345";
// ...test body...
_server.AccessToken = null; // only runs on success path

If an assertion fails or exception is thrown, _server.AccessToken remains set, corrupting the shared server state for subsequent tests in the same fixture.

Fix: Wrap with try/finally { _server.AccessToken = null; }.


Test Coverage

The two new WsBridgeIntegrationTests tests cover the happy path and rejection path correctly. Missing:

  • Test for TokenEquals with equal-length tokens (both matching and non-matching)
  • Test for migration path: verify SecureStorage write failure does NOT scrub JSON (data safety test)
  • Test for DevTunnel proxied connection flow (header stripping behavior)

Summary

The security goals are sound — removing loopback bypass, constant-time comparison, directory-wide plugin hash, SecureStorage migration. The execution has three critical issues: the loopback removal likely breaks DevTunnel connectivity, the sync-over-async SecureStorage access is a deadlock risk on iOS, and the migration scrubs JSON without confirming Keychain writes succeeded.

⚠️ Request changes — address the three 🔴 CRITICAL findings before merge.


5-model review: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex

@PureWeen
Copy link
Owner Author

PR #341 Re-Review (Round 2) — New commit 068b070 addresses feedback

Previous Findings Status

# Finding Prior Severity Status
1 GetSecureStorageSync deadlock on iOS 🔴 CRITICAL ⚠️ STILL PRESENT (downgraded to 🟡 MODERATE — see below)
2 Silent secret loss during migration 🔴 CRITICAL 🟡 PARTIALLY FIXED — per-field check added, but partial migration bug remains
3 DevTunnel loopback removal + mobile rejected 🔴 CRITICAL FIXEDX-Bridge-Authorization header survives DevTunnel proxying
4 TokenEquals length oracle 🟡 MODERATE FIXED — SHA256 + CryptographicOperations.FixedTimeEquals
5 Plugin hash scope mismatch 🟡 MODERATE FIXED — both paths use ComputeDirectoryHash(Path.GetFullPath(...))
6 Test cleanup not in finally 🟢 MINOR FIXED — both tests use try/finally

Remaining Issues

🟡 MODERATE — GetSecureStorageSync sync-over-async pattern (Finding #1)

The code still uses:

try { return Task.Run(() => SecureStorage.Default.GetAsync(key)).GetAwaiter().GetResult(); }

Task.Run mitigates the classic SynchronizationContext deadlock, and iOS Keychain APIs are generally thread-safe (no main-thread dispatch). So this works in practice on current MAUI/iOS. Downgrading from CRITICAL to MODERATE because:

  • The pattern is architecturally brittle (would break if SecureStorage internals ever change)
  • .GetAwaiter().GetResult() still blocks a thread
  • An async Load() overload would be cleaner long-term

Not a ship-blocker, but worth a follow-up.

🟡 MODERATE — Partial migration can still lose secrets (Finding #2, partially fixed)

The per-field SetSecureStorageSync check is good — if a single field fails to migrate, it stays as null in memory. However, the migratedAny flag is shared across all fields:

if (SetSecureStorageSync(RemoteTokenKey, legacyRemote))
{ storedRemote = legacyRemote; migratedAny = true; }
// ...
if (migratedAny)
    Save();  // Serializes with [JsonIgnore] → scrubs ALL legacy tokens from JSON

Scenario: RemoteToken migrates ✅, but LanToken fails ❌ (Keychain full / entitlement issue):

  1. migratedAny = trueSave() called
  2. Save() serializes with [JsonIgnore] on all three token properties
  3. JSON file no longer contains "LanToken": "..."
  4. LanToken is not in SecureStorage (SetAsync failed) and not in JSON → lost

Suggested fix: Only scrub from JSON after verifying all attempted migrations succeeded, or track per-field migration state:

bool allMigrationsSucceeded = true;
// ... set to false if any SetSecureStorageSync fails
if (migratedAny && allMigrationsSucceeded)
    Save();

Verdict: ⚠️ Request changes

4 of 6 original findings fixed — excellent progress. The two remaining issues are:

  1. Sync-over-async pattern (MODERATE, follow-up OK)
  2. Partial migration secret loss (MODERATE, should fix before merge — low probability but real data loss)

Fix #2 and this is ✅ ready to merge.

@PureWeen
Copy link
Owner Author

PR #341 — Round 3 Re-Review (5-Model Consensus)

Models: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex
CI: ⚠️ No checks configured on fix/bridge-hardening
Commit reviewed: 068b070


Previous Findings Status

# Finding Round 2 Severity Round 3 Status
1 GetSecureStorageSync sync-over-async 🟡 MODERATE ⚠️ STILL PRESENT (5/5 models)
2 Silent secret loss / migratedAny 🟡 MODERATE ⚠️ STILL PRESENT (5/5 models)
3 DevTunnel loopback / X-Bridge-Authorization 🔴 CRITICAL FIXED
4 TokenEquals timing oracle 🟡 MODERATE FIXED
5 Plugin hash scope mismatch 🟡 MODERATE FIXED
6 Test cleanup not in finally 🟢 MINOR FIXED

🟡 MODERATE — Finding #2: Silent secret loss on partial migration (5/5 models — STILL PRESENT)

PolyPilot/Models/ConnectionSettings.csMigrateAndLoadMobileSecrets

The migratedAny shared flag was not fixed. The per-field SetSecureStorageSync return-value check prevents setting migratedAny for a failed field, but the shared flag still causes Save() to fire when any field succeeds — scrubbing all [JsonIgnore] fields from JSON even the ones whose SecureStorage write failed.

Scenario (unchanged from Round 2):

  1. Legacy JSON has RemoteToken=A, LanToken=B
  2. SetSecureStorageSync(RemoteTokenKey, "A") → succeeds → migratedAny = true
  3. SetSecureStorageSync(LanTokenKey, "B")failsstoredLan = null, _lanToken = null
  4. migratedAny == trueSave() called
  5. [JsonIgnore] means JSON is written without LanToken
  6. Result: LanToken lost from both SecureStorage (write failed) and JSON (scrubbed) — permanently

Fix: Use per-field migration flags; only call Save() when every legacy value that existed was successfully migrated:

bool allSucceeded = true;
if (!string.IsNullOrEmpty(legacyLan) && !SetSecureStorageSync(...))
    allSucceeded = false;
// ...
if (migratedAny && allSucceeded)
    Save();

🟡 MODERATE — Finding #1: GetSecureStorageSync sync-over-async (5/5 models — STILL PRESENT)

PolyPilot/Models/ConnectionSettings.csGetSecureStorageSync / SetSecureStorageSync

The Task.Run(() => SecureStorage.Default.GetAsync(key)).GetAwaiter().GetResult() pattern is unchanged. Downgrade from CRITICAL maintained: Task.Run avoids the SynchronizationContext capture deadlock in the common case. The residual risk — a platform SecureStorage implementation dispatching internally to the main thread — is low but non-zero, and the pattern is architecturally fragile. Not a ship-blocker on its own, but it should be tracked as follow-up debt.


🟡 MODERATE — NEW: Orphaned random token locks out all clients when IssueAccessTokenAsync fails (2/5 models)

PolyPilot/Services/DevTunnelService.cs:~213

_bridge.AccessToken = Convert.ToBase64String(RandomNumberGenerator.GetBytes(32)); // placeholder
_bridge.Start(BridgePort, copilotPort);
// ... later ...
// if IssueAccessTokenAsync returns null → real token never set
// _bridge.AccessToken remains the random placeholder

If IssueAccessTokenAsync() returns null, the bridge starts and continues running with the opaque random placeholder as its AccessToken. No client ever receives this value, so no connection is possible. Pre-PR behaviour degraded to fail-open (loopback bypass); post-PR behaviour is fail-closed with no user-visible error. The UI appears operational but the bridge is permanently unreachable.

Fix: On IssueAccessTokenAsync failure, either clear _bridge.AccessToken = null (restoring fail-open for the error path) or propagate the failure and stop the bridge.


🟡 MODERATE — NEW: SaveMobileSecretsIfDirty ignores write failures, clears dirty flag unconditionally (2/5 models)

PolyPilot/Models/ConnectionSettings.csSaveMobileSecretsIfDirty

SetSecureStorageSync(RemoteTokenKey, _remoteToken);  // return value ignored
SetSecureStorageSync(LanTokenKey, _lanToken);         // return value ignored
SetSecureStorageSync(ServerPasswordKey, _serverPassword); // return value ignored
_secretsDirty = false;  // always cleared

All three SetSecureStorageSync calls return a bool indicating success, but their return values are discarded. _secretsDirty is unconditionally cleared. If any write fails, the in-memory value survives the session but is not retried on the next Save() — the token will be lost on the next app restart because JSON has it as [JsonIgnore] and SecureStorage never received it.

Fix: Only clear _secretsDirty = false after all writes succeed; leave it true on partial failure to allow a retry.


🟢 MINOR — ComputeDirectoryHash lacks content/filename framing (2/5 models)

PolyPilot/Services/PluginLoader.csComputeDirectoryHash

Filename bytes and file-content bytes are concatenated without a length prefix or delimiter. Theoretical: file named "ab" with content "cd" hashes identically to file named "abc" with content "d". Negligible in practice with real .dll filenames, but a null-byte separator or length prefix between filename and content would make the scheme provably collision-free.


Test Coverage Assessment

The two new integration tests correctly cover the rejection and success paths for the updated auth logic. Missing:

  • Test asserting that partial SecureStorage migration failure does not scrub the non-migrated token from JSON
  • Test asserting that SaveMobileSecretsIfDirty retains _secretsDirty = true when a write fails
  • Test for DevTunnelService: bridge remains accessible after IssueAccessTokenAsync fails (or correctly errors)

Summary

Excellent progress over three rounds — 4 of 6 original findings are fully resolved and the security posture is materially improved. Two issues remain unresolved and one new MODERATE was introduced:

Issue Severity Ship-blocker?
migratedAny partial migration data loss 🟡 MODERATE Yes — real data loss on any Keychain error
SaveMobileSecretsIfDirty ignores write failures 🟡 MODERATE Yes — same data-loss class
DevTunnel failure leaves bridge locked 🟡 MODERATE Yes — silent breakage with no recovery path
GetSecureStorageSync sync-over-async 🟡 MODERATE No — acceptable follow-up
ComputeDirectoryHash framing 🟢 MINOR No

⚠️ Request changes — three MODERATE issues must be addressed before merge. The data-loss scenario in MigrateAndLoadMobileSecrets/SaveMobileSecretsIfDirty and the DevTunnel failure lockout are all straightforward fixes.


Round 3 — 5-model review: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex. Consensus filter: issues flagged by 2+ models only.

PureWeen and others added 3 commits March 11, 2026 00:23
- WsBridgeServer: remove unconditional loopback trust when a token is
  configured; loopback-only mode (no token set) still allows unauthenticated
  local connections as before
- WsBridgeServer: replace string.Equals token comparison with constant-time
  comparison using CryptographicOperations.FixedTimeEquals
- DevTunnelService: set a random ephemeral token on the bridge before Start()
  so the listening socket is never open without auth while the tunnel handshake
  is in progress; replaced by the real access token once issued
- PluginLoader: extend integrity verification to cover all DLL files in the
  plugin directory rather than only the entry-point assembly; filenames are
  included in the hash so renames are detected; existing plugin approvals will
  require re-confirmation
- ConnectionSettings: on iOS/Android/MacCatalyst, store RemoteToken, LanToken,
  and ServerPassword in the platform SecureStorage (Keychain on Apple) rather
  than plain JSON; existing values are migrated and scrubbed from settings.json
  on first load
- Update WsBridgeIntegrationTests to reflect new auth behaviour

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n safety

- WsBridgeServer: fix TokenEquals length oracle by hashing both sides to
  SHA-256 before FixedTimeEquals, so comparison time is always constant
  regardless of input length
- WsBridgeClient + WsBridgeServer: add X-Bridge-Authorization header alongside
  X-Tunnel-Authorization; DevTunnel strips X-Tunnel-Authorization before
  proxying to localhost but passes custom headers through, so mobile clients
  now reach the server's token validator even when connecting via DevTunnel
- ConnectionSettings: SetSecureStorageSync now returns bool; migration only
  scrubs legacy JSON field after confirming SecureStorage write succeeded,
  preventing silent token loss if Keychain write fails
- PluginLoader: canonicalize plugin directory path via Path.GetFullPath before
  hashing to avoid hash mismatches when discovery and load paths differ under
  symlinks or relative path segments
- WsBridgeIntegrationTests: wrap AccessToken teardown in finally blocks so
  subsequent tests are not affected if an assertion throws

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ConnectionSettings: per-field SecureStorage migration tracking; JSON is
  only scrubbed when every attempted field was written successfully so a
  partial Keychain failure cannot silently lose any token
- ConnectionSettings: SaveMobileSecretsIfDirty checks SetSecureStorageSync
  return values and keeps _secretsDirty=true on partial write failure so
  the next Save() will retry
- DevTunnelService: if IssueAccessTokenAsync returns null, clear the random
  placeholder token so the bridge reverts to local-only mode rather than
  remaining permanently locked with an opaque token no client can obtain
- PluginLoader: add null-byte framing between filename and content bytes in
  ComputeDirectoryHash to make the hash scheme provably collision-free
- WsBridgeServerAuthTests: update HTTP probe test to reflect new behaviour
  (loopback no longer bypasses auth when token is configured; correct token
  via X-Bridge-Authorization is accepted from any origin)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the fix/bridge-hardening branch from 068b070 to b25cd66 Compare March 11, 2026 05:23
Copy link
Owner Author

@PureWeen PureWeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #341 Round 4 Review (Multi-Model AI Consensus)

Round 3 Findings Status

# Finding Status
1 Partial migration data loss (MigrateAndLoadMobileSecrets) ✅ FIXED -- per-field tracking + atomic scrub-on-all-success
2 SaveMobileSecrets ignores failures ✅ FIXED -- retains _secretsDirty=true on failure for retry
3 DevTunnel permanent lockout ✅ FIXED -- clears AccessToken=null on failure, local-only fallback

New Findings

🟡 MODERATE -- Crash-path migration residual (ConnectionSettings.cs MigrateAndLoadMobileSecrets)
allSucceeded uses migratedRemote (write-this-launch flag) instead of !string.IsNullOrEmpty(storedRemote) (value-in-SecureStorage). If app crashes between SecureStorage.SetAsync and Save(), next launch sees storedRemote populated, skips migration, migratedRemote=false, allSucceeded=false → plaintext stays in JSON permanently. One-line fix: check stored values not migration flags.

🟡 MODERATE -- Stale test helper (DevTunnelServiceTests.cs:~893)
InvokeValidateClientToken helper still contains old if (isLoopback) return true logic that was removed from production code. Tests pass but validate stale behavior, not the new token-based validation.

🟢 MINOR -- Plugin hash invalidation (PluginLoader.cs)
ComputeDirectoryHash changes hash scheme, invalidating all existing plugin approvals on upgrade with no migration path. Consider a HashVersion field.

🟢 MINOR -- Missing test coverage (PluginLoader.cs)
ComputeDirectoryHash has no unit tests. Existing ComputeHash tests exercise dead code path.

Verdict: ⚠️ Request Changes (minor)

Two moderate findings need one-line fixes each. Overall the security hardening is excellent.

5-model AI consensus review (Round 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant