feat(ne): enable Background Transport + Phase B suspended-notifications + dual-interface tunnel#57
feat(ne): enable Background Transport + Phase B suspended-notifications + dual-interface tunnel#57torlando-tech wants to merge 40 commits into
Conversation
Phase 1.6-1.9 of the staged plan plus the onboarding step: - TunnelManager.disable() now sets isEnabled=false and saveToPreferences() after stopVPNTunnel(); calling stopVPNTunnel() alone leaves the profile partially-active in iOS routing, which was the root cause of the "toggle off but TCP stays broken" report. - AppServices auto-restarts the tunnel from the App Group's tunnel_enabled preference at initialize() time so users don't have to re-toggle on every launch. - SettingsView toggle uses do/catch with DiagLog and an inline error label so install / start failures (entitlement issues, declined VPN-profile prompts) are visible instead of silently bouncing the toggle off. Toggle persists tunnel_enabled on success. - New onboarding step (page 4 of 6) "Stay Connected in the Background" with a pre-checked toggle. completeOnboarding() writes the value to the App Group so AppServices can auto-start on first launch and trigger the VPN-profile prompt at the right moment. - ENABLE_NETWORK_EXTENSION compilation flag is now set on ColumbaApp's Debug + Release configs alongside CODE_SIGN_ENTITLEMENTS pointing at ColumbaApp.entitlements. The app target depends on the extension target and embeds it via a PBXCopyFilesBuildPhase (Foundation Extensions, dstSubfolderSpec=13). Verified with xcodebuild — Debug iphonesimulator build succeeds and copies ColumbaNetworkExtension.appex into ColumbaApp.app/PlugIns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR introduces three tightly coupled features: the dual-interface tunnel architecture (separate
Confidence Score: 3/5Merging carries risk — several open issues from prior review rounds directly affect core paths that this PR extends. The new dual-interface architecture and Phase B notification logic are well-structured and the key edge cases are explicitly handled. However, shutdown() leaves tunnelTCPInterface registered against the dead transport (identity switches silently drop all tunnel-path inbound frames), requiredInterfaceType = .other blocks the extension TCP connection on cellular-only devices, and the object: nil VPN-status observer fires on any system VPN change. Sources/ColumbaApp/Services/AppServices.swift (shutdown gap), Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift (requiredInterfaceType), Sources/ColumbaApp/Services/TunnelManager.swift (third-party VPN observer scope) Important Files Changed
Sequence DiagramsequenceDiagram
participant App as ColumbaApp (Foreground)
participant TunnelMgr as TunnelManager
participant EXT as PacketTunnelProvider (NE)
participant RNSD as rnsd (relay)
App->>TunnelMgr: start()
TunnelMgr-->>App: onStatusChange(.connected)
App->>App: registerTunnelInterface()
App->>RNSD: sendAllAnnounces [foreground TCP + tunnel]
Note over App,RNSD: 100ms delay
App->>RNSD: sendAnnounceViaTunnel [tunnel only]
Note over RNSD: path-table pins to tunnel socket
Note over App: App suspends
App--xRNSD: foreground TCP socket dies
RNSD->>EXT: inbound DATA or LINKREQUEST
EXT->>EXT: maybeScheduleNotification()
EXT->>EXT: UNUserNotificationCenter.add(ext-linkreq-hash)
EXT->>App: SharedFrameQueue.append + Darwin packetReady
Note over App: App foregrounds
App->>App: ExtensionFrameReader drains queue
App->>App: transport.handleReceivedData(TUNNEL_TCP_INTERFACE_ID)
App->>App: removeExtensionPlaceholders + postMessageNotification
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
Sources/ColumbaApp/Services/AppServices.swift:194-201
**Dead code retained with misleading justification**
The comment says `isTunnelModeActive` is kept so `applyTunnelModeToInterfaces`'s idempotency guards "still compile cleanly while the function itself is no-longer called." But since the function is never invoked, its guards have no effect at all. The dead function and its state variable currently confuse the picture of which paths are live: future contributors reading the `onStatusChange` handler will see the `isTunnelModeActive` field alongside `tunnelTCPInterface` and may not realise the tunnel-mode-flip architecture is fully replaced. The entire `applyTunnelModeToInterfaces` function and `isTunnelModeActive` variable can be deleted now that the dual-interface refactor is complete.
Reviews (34): Last reviewed commit: "fix(tunnel): re-announce on background t..." | Re-trigger Greptile |
P1: TunnelManager.start() — set self.isEnabled = true up-front so the re-enable path after disable() doesn't leave the observable stale. P1: SettingsView toggle — add tunnelPending @State that overrides the binding's get during VPN start/disable transitions, with a 30s settle loop that waits for tunnel.isRunning to match the user's intent before clearing the override. Without this, .connecting / .disconnecting re-renders snap the toggle back across the user-facing transition. P2: TunnelManager.disable() — move isEnabled = false before any throwing call so a thrown saveToPreferences leaves observers seeing the user's intent rather than the stale pre-call value. P2: OnboardingViewModel — gate the tunnel_enabled write with ENABLE_NETWORK_EXTENSION so non-extension builds don't write a stale true that nothing reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1: Cancel the in-flight Background Transport Task before spawning a new one so a rapid ON→OFF tap can't race the previous start()'s install() flow. Without this, an older Task_ON would silently finish install() and call startVPNTunnel() after the user's last intent was OFF — leaving the toggle visually on a state opposite the actual VPN. Adds a checkCancellation() in TunnelManager.start() right before startVPNTunnel() so a cancelled caller can't fire iOS's VPN bring-up after the await. Cancellation is treated as supersession (silent return) rather than an error — the new Task already owns the next state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1: Surface async tunnel-connection failures. After startVPNTunnel() returns successfully but iOS later fails to bring the VPN up (airplane mode, routing failure, extension crash), the toggle's 30s settle loop times out without setting tunnelErrorMessage — exactly the silent-bounce the PR description claims to replace. After the loop, if newValue==true but tunnel.isRunning==false, fetch the disconnect reason via NEVPNConnection.fetchLastDisconnectError and show it inline. P2: Gate the Background Transport onboarding step on ENABLE_NETWORK_EXTENSION. pageCount is 6 with the flag and 5 without; the page-3 case in OnboardingView is wrapped in an #if/#else, with extracted `permissionsPageView()` / `completePageView()` helpers so both branches stay readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: Status row now mirrors tunnelPending ?? tunnel.isRunning so the indicator dot and label match the toggle's visual state during .connecting / .disconnecting. Adds "Starting…" / "Stopping…" during the transitional window, replacing the previous "Stopped" label that contradicted the ON-position toggle. P2: Persist tunnel_enabled to the App Group only after the actual VPN status matches the user's intent. Writing it before the status is confirmed would auto-restart the same failing tunnel on every relaunch when start() succeeds at launch but iOS later rejects the connection (airplane mode, routing failure, extension crash). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: Annotate the Background Transport toggle's Task as @mainactor so the @State mutations (tunnelPending, tunnelErrorMessage) are guaranteed to run on the main actor instead of relying on the inherited-but-undefined SwiftUI Task isolation. P2: AppServices auto-start now clears the tunnel_enabled pref on failure so persistent issues (revoked profile, missing entitlement, OS-level VPN restriction) don't silently retry every launch. The user re-enables from Settings, where the toggle's error label can show the actual failure reason instead of dying silently in DiagLog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: When `disable()` throws after the synchronous `stopVPNTunnel()` ran (e.g. an unusual OS-level `saveToPreferences()` failure), persist the user's OFF intent to the App Group anyway so a relaunch doesn't auto-restart the tunnel against their wishes. Start errors still leave the pref alone — committing to a failing start would loop the same failure on every launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1: AppServices auto-start now polls for `tunnel.isRunning` after calling `tunnel.start()` (mirroring the Settings toggle's settle window) so async failures — airplane mode, routing failure, extension crash — clear the pref instead of looping silently on every cold-launch. Wraps the auto-start in a detached Task so the 30-second wait doesn't block the rest of `initialize()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets an existing user walk through the onboarding pages again without losing chats / identities — useful when verifying a newly-added onboarding step (e.g. Background Transport). The OnboardingView is presented with `isRestart = true` and the view-model's `completeOnboarding()` skips identity / interface / display-name creation in that mode, only committing the values that the new steps drive. Gated behind `#if DEBUG` so it doesn't ship to production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported regression: enabling Background Transport killed AutoInterface peer discovery — no announces went out, no peers spawned, even in foreground. Root cause: the extension's `NWConnectionGroup` is hard-coded to `ff02::1` on a single port, but reticulum-swift's AutoInterface derives its multicast group per groupId (`ff12:0:...` from `multicastAddress(for:)`) and runs per-peer unicast on a separate data port (42671). Putting Auto into tunnel mode tore down the local NWConnectionGroup and replaced it with a non-functional one in the extension — hence no peer discovery and no traffic. Fix: skip Auto in `applyTunnelModeToInterfaces`. AutoInterface is intrinsically local-Wi-Fi only — iOS suspending multicast in the background is an OS-level limit, not something the tunnel can paper over. TCP keeps delivering messages while backgrounded, which is the whole point of Phase 1. A future change can reimplement the Auto protocol (groupId-derived multicast + per-peer unicast) inside the extension if we want background Auto too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt used a hard-coded `ff02::1` `NWConnectionGroup` and a single port — but reticulum-swift's AutoInterface derives its multicast address from the group id (`ff12:0:…` via `multicastAddress(for:)`) and runs per-peer unicast on the data port (42671). Tunneling Auto through that broken listener killed peer discovery and silently dropped data. This change links `ReticulumSwift` into the extension target and runs an actual `AutoInterface` instance inside the Network Extension via a new `ExtensionAutoBridge`: - `ExtensionAutoBridge` instantiates `AutoInterface` with the configured group id, sets a delegate that funnels every received packet (parent AutoInterface + every spawned `AutoInterfacePeer` sub-interface) into `SharedFrameQueue` with the Auto tag, and exposes a `send(_:)` that hands outbound bytes off to `AutoInterface.send(_:)` for the regular per-peer fan-out. - `PacketTunnelProvider` now drives the bridge from `applyConfigsLocked` (start / stop on group-id diff) and routes app outbound (the `auto` tag in `handleAppMessage`) through `autoBridge.send(_:)`. - `applyTunnelModeToInterfaces` puts the app's AutoInterface back into tunnel mode when the VPN is up — this reverts the temporary "Auto stays local" stop-gap. Net effect: once the tunnel is connected, Auto peer discovery and data delivery happen entirely inside the extension, so they keep working when the app is backgrounded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `ExtensionDiagLog` writing to `ext_diag.log` in the App Group container so both the extension and the app can append diagnostic lines (Network Extensions don't have a clean equivalent of `DiagLog`'s file-backed log). `AppServices.initialize()` snapshots the file into the app's `Documents/ext_diag.log` on every launch so it's pullable via `xcrun devicectl device copy from`. Hooks logging into ExtensionAutoBridge (start / stop / peer add / peer remove / RX bytes / TX bytes / TX failures / TX dropped because autoInterface is nil) and a couple of breadcrumbs in `PacketTunnelProvider` (`startTunnel` / Auto config (re)applying). Lets us see whether the extension's AutoInterface is actually firing on real devices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two attempts to put AutoInterface into the extension hit the same NEPacketTunnelProvider sandbox limitation: 1. reticulum-swift's `AutoInterface` (POSIX sockets bound to link-local IPv6 + per-peer `sendto`) — bind on the data port succeeds but iOS routes inbound unicast UDP to the system networking stack, not the extension's socket. Multicast loopback works, real LAN packets never arrive. 2. From-scratch implementation on Apple's Network framework (`NWMulticastGroup` for HELLO discovery + `NWListener` for inbound unicast data + per-peer `NWConnection` for outbound) — same outcome. `NWListener.newConnectionHandler` never fires even with no `requiredInterfaceType`. Confirms the limitation isn't the API choice; it's the extension sandbox. Reverts both bridge implementations and the `onWillStart` / "release UDP sockets before extension launch" plumbing. Phase 1 ships TCP-only background, which is the win that actually solves issue #54 (messages-while-locked over TCP). AutoInterface keeps working locally for foreground use — same behaviour the user had before this PR. Background AutoInterface needs a different architecture (e.g. configuring the tunnel's `includedRoutes` to capture the multicast group + dataPort and reading them via `packetFlow`) and is left for a future PR. Keeps the `ExtensionDiagLog` plumbing in place for future debugging and the diff logic in `applyConfigsLocked` so the re-enable path is short. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tools/auto-test/` runs the full loop without manual UI taps: - `send_test_traffic.py` mirrors reticulum-swift's `AutoInterfaceConstants` (`ff12:0:…` group derivation, SHA-256 discovery token, 29716/42671 ports) so a Mac on the same Wi-Fi can stand in for a Sideband peer — sends multicast HELLOs + one unicast announce-shaped UDP packet. - `run_test.sh` builds, installs, relaunches, sends test traffic, pulls `ext_diag.log` + `diag.log` via `xcrun devicectl device copy from`, and greps for expected entries. Exit code 0 when the expected entries are present. The current revision asserts the basic "tunnel reached enabled state" path because auto-in-extension is reverted in this PR. Verifier comments mark where to re-enable the `NWListener accepted inbound` assertion when we revisit background AutoInterface with a different architecture. Known gap: iOS keeps the running extension instance across app re-deploys, so the harness still needs the user to delete and re-add the VPN profile in iOS Settings once per build to load the new extension binary. Plan is to bake a `/debug/reload- extension` `handleAppMessage` command into the extension that calls `cancelTunnelWithError` so the harness can force-reload without any UI taps — see TODO in `run_test.sh`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After on-device testing surfaced three regressions: 1. Re-run Onboarding orphaned existing chats. CompletePage's onAppear raced the outer view's isRestart propagation, so prepareIdentity created a fresh identity and switched to it before isRestart reached the view model. Move isRestart into the view model's init so prepareIdentity / completeOnboarding both see the correct flag from the first call. 2. switchIdentity created a duplicate "tcp-server" TCPInterface. The legacy initialize(identity:identityHash:tcpServerAddress:) path parsed the supplied address and created a TCPInterface with id "tcp-server" alongside the InterfaceRepository-owned UUID entity that Step 7 connects on identity-switch. Both ended up in tunnel mode, splitting outbound. Drop the parameter and the synthesized interface — Step 7's repository iteration is now the only source of TCP interfaces. 3. AutoInterface in the extension is non-functional. Empirical testing on device confirmed iOS sandboxes UDP outbound from NEPacketTunnelProvider for both Network framework primitives (NWConnection / NWConnectionGroup silently drops) and POSIX sendto (ENETUNREACH). Inbound works (NWListener accepts unicast, POSIX IPV6_JOIN_GROUP receives multicast) but the sandbox blocks the reply path, so Auto cannot peer from the extension at all. Revert applyTunnelModeToInterfaces to TCP-only and update onboarding + Settings copy to make Auto's foreground-only behaviour explicit. BLE / RNode keep their own background-mode plumbing (Phase 2) and aren't covered by the same caveat. Plus race fix in connectTCPInterface: when the tunnel auto-starts during cold launch and reaches .connected before Step 7 has populated tcpInterfaces, the late-added interface stays on its local NWConnection. Apply tunnel mode at the late-add site too.
- ExtensionDiagLog: add 1 MiB tail-keep rotation so the always-on extension's log can't exhaust the App Group container (which would silently break SharedFrameQueue.append). Drops the oldest ~half on cap-exceed, aligned to a newline so we don't truncate mid-line. - run_test.sh: derive DERIVED from xcodebuild's BUILD_DIR rather than hardcoding the DerivedData hash (which Xcode regenerates on rename / fresh checkout). DEVELOPMENT_TEAM is now an env override with the same default; DEVICE_UDID was already overridable. Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Gate two extension-only diagnostic probes behind `#if DEBUG`: - PacketTunnelProvider.startTunnel(): startDiagListener() and sendDiagOutboundProbe() were called unconditionally. The outbound probe targets a hard-coded developer link-local IPv6 (fe80::c2d:e309:eb09:6343) on every user's device on every tunnel start, leaking the dev's address; the listener also bound port 9999 in production. Both belong only in builds the test harness drives. - ExtensionAutoBridge.receiveLoop(): the synthetic "ext-rx-ack-…" echo back to every Auto peer is a one-shot probe to test whether iOS allows replies on accepted UDP flows; in production it floods every peer with non-protocol ASCII payloads and muddies on-wire debugging. Same #if DEBUG gate. Verified with `xcodebuild -configuration Debug` and `xcodebuild -configuration Release` (both succeed; no new warnings on the changed files in Release). Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Drop personally-identifying defaults from `tools/auto-test/run_test.sh`: - DEVICE_UDID and DEVELOPMENT_TEAM no longer have hardcoded defaults baked into the script. Both are unique identifiers (a specific physical device's UDID; an Apple Developer Team ID) and shouldn't live in source control even with the override path. Greptile flagged the security risk of the UDID staying in HEAD. - The script now errors out early with a clear message if either is unset (DEVELOPMENT_TEAM is only required when not using --skip-build). - Top-of-file prereqs block updated to document the env-var contract. Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Gate `DiagLog.snapshotExtensionLog()` behind `#if DEBUG`. The call mirrors the extension's diag log into `Documents/ext_diag.log` on every cold launch — useful for `xcrun devicectl device copy from` during development, but in production it surfaces connection diagnostics into the user's File-Sharing-visible Documents folder on every app start. Greptile flagged this as the last 4-to-5 ceiling-keeper. Verified Debug + Release builds. Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
* fix: hot-swap TCP interfaces without disturbing the others Toggling/editing any TCP interface in Interfaces settings was tearing down every other healthy TCP connection alongside the one the user actually changed. Each reconnect triggered the relay to redeliver its full announce table, swamping the app for ~90s per change (90k+ announces in one minute, observed on rmap.world). Two layers of fix: 1. `AppServices.connectTCPInterface(entityId:host:port:)` is now idempotent. It tracks the last-applied host:port per entity and returns immediately when called with the same endpoint as the currently-running interface. Calling it with a different endpoint still disconnects-and-recreates as before. 2. `InterfaceManagementViewModel.applyChanges` loops over every enabled TCP entity (not just the one that changed). It now skips entities whose endpoint hasn't moved, avoiding both the connect call AND the brief `.connecting` UI flicker. Stop and shutdown paths clear the endpoint dictionary alongside `tcpInterfaces` so a future re-add doesn't short-circuit against a stale entry. Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on existence checks and don't trigger this. Config changes for those types still don't take effect without a manual disable/re-enable — separate issue, smaller blast radius, not addressed here. * fix: hot-swap TCP interfaces without disturbing the others Toggling/editing any TCP interface in Interfaces settings was tearing down every other healthy TCP connection alongside the one the user actually changed. Each reconnect triggered the relay to redeliver its full announce table, swamping the app for ~90s per change (90k+ announces in one minute, observed on rmap.world). Two layers of fix: 1. `AppServices.connectTCPInterface(entityId:host:port:)` is now idempotent. It tracks the last-applied host:port per entity and returns immediately when called with the same endpoint as the currently-running interface. Calling it with a different endpoint still disconnects-and-recreates as before. 2. `InterfaceManagementViewModel.applyChanges` loops over every enabled TCP entity (not just the one that changed). It now skips entities whose endpoint hasn't moved, avoiding both the connect call AND the brief `.connecting` UI flicker. Stop and shutdown paths clear the endpoint dictionary alongside `tcpInterfaces` so a future re-add doesn't short-circuit against a stale entry. Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on existence checks and don't trigger this. Config changes for those types still don't take effect without a manual disable/re-enable — separate issue, smaller blast radius, not addressed here. * feat: multi-TCP tunnel — extension manages a connection per entity Previously the Network Extension kept a single `tcpConnection` and a single `currentTCP` endpoint, so enabling two TCP relays in the app silently dropped one — the extension's config loader overwrote `result.tcp` on every iteration and only the last enabled tcpClient in the JSON array got a socket. The other relay was unreachable through the tunnel and inbound from the wrong relay was routed back to whichever `TCPInterface` happened to be first in the app's dictionary. This commit lifts the entire tunnel TCP layer to per-entity: - `SharedFrameQueue` frame format gains a 1-byte entityId-length field and a length-prefixed UTF-8 entity id between the interface tag and the frame payload. Old format frames in flight at the upgrade are lost on first read; the queue is append-and-clear so the lifetime is short. - `TunnelManager.sendFrame` adds an `entityId` parameter and writes it into the IPC envelope sent via `sendProviderMessage`. `connectTCPInterface` and `applyTunnelModeToInterfaces` now capture the entity id in the per-interface tunnel-mode hook so outbound frames from each `TCPInterface` carry their own id. - `ExtensionFrameReader.onTCPFrameReceived` is now `(entityId, data)` and the AppServices handler routes inbound frames to the matching `TCPInterface` by id, with safe fallbacks for empty/legacy ids. - `PacketTunnelProvider` replaces `tcpConnection` / `tcpReceiveBuffer` / `currentTCP` with per-entity dicts. Each `NWConnection` has its own HDLC receive buffer (sharing one buffer between two streams would corrupt frame boundaries), its own state-update handler that only tears down its own entry, and its own `receiveTCPData` recursion so inbound frames are tagged with the right id when appended to the queue. - `applyConfigsLocked` diffs per-entity: an entry whose endpoint is unchanged keeps its connection, a removed entry tears down only its own socket, an edited entry restarts only that socket. Adding a second relay no longer disturbs the first. - `loadInterfaceConfigs` returns `tcps: [String: (host, port)]` keyed by `InterfaceEntity.id` instead of a single optional. `handleAppMessage` parses the new wire format (entityId-length + entityId in front of frame data) and looks up the connection by id, falling back to the sole connection when the id is empty so a hypothetical legacy single-TCP build still routes correctly. * chore: extension diag logs for TCP config/state changes Lifecycle events only — config (re)apply, config removal, state transitions, failure. Per-frame and per-drain logging is omitted to keep the file small. Per-entity tagging in the messages makes multi-TCP behaviour observable without needing syslog access. Used to diagnose the silent-inbound regression that turned out to be the SharedFrameQueue wire-format roll-out interacting with a not-yet-relaunched extension; left in place for future debugging. * feat(InterfaceManagement): add TCP client community-server wizard Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding add-interface surface: server selection (bootstrap/community/custom) → review & configure. Routes Settings → Network Interfaces → + → TCP Client through the wizard instead of the blank manual entry sheet, and reroutes edit-existing for TCP entries to the same flow with pre-filled values. Scoped to the fields TCPClientConfig already supports (host, port, networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred. Closes #51 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * fix(MicronParser): persist formatting state across lines (#63) * fix(MicronParser): persist formatting state across lines The line-by-line parse loop hardcoded `currentStyle: .plain` on every parseInline call, so a `Fxxx`Bxxx preamble line consumed its colors into an empty span and the following ASCII art rendered with no fg/bg. Match python NomadNet's MicronParser by promoting currentStyle to a parser-loop local that threads through every parseInline call, with parseInline returning the terminal style so the caller can carry it forward. `< at line-start additionally resets currentStyle to .plain, matching python's `<` semantics. Repro: the index.mu at github.com/fr33n0w/thechatroom uses the preamble shape `F0ff`B52f then ASCII art then `f`b — before this fix the colors were silently dropped. Closes #31 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * fix(NodeDetailsView): allow tapping action buttons on stale-path contacts Browse Site / Start Chat / Set as My Relay were `.disabled(!isOnline)` on a contact's NodeDetailsView, where `isOnline` is just `Date() < entry.expires` from the path table. After cleanupLinks runs `expirePath` on a failed-link destination, the contact's path becomes "expired" until a new announce arrives — but Reticulum's path discovery is exactly designed for that case (issue a path-request, any peer with a recent announce will respond). Greying the button blocks the user from the very operation that would heal the path. Drops the `.disabled` and `.opacity` modifiers from `actionButton(...)` and the relay-toggle button. The underlying flow (`NomadNetBrowserService.resolveValidPath`) already does `pathTable.remove` + `transport.requestPath` + 10s poll, so taps now flow through to the working recovery path. Also reword the expired-hint copy from "Ask them to send an announce from their app, or wait for one to arrive automatically" to "Tap an action to issue a path request — any node on the network with a recent announce will respond." — the original copy is wrong about how Reticulum path discovery works and discourages users from doing the right thing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(MicronDocumentView): render the chat-room ASCII art correctly Three bugs surfaced once the parser carried `Bxxx background colors forward across lines (faf17e4): 1. Centering broke against the document, not the screen. A wide row (e.g. fr33n0w/thechatroom's 550-char trailing-whitespace line) pushed the VStack out to ~4600pt; centered shorter rows landed at the middle of *that* width — way past the viewport. Fixed by capturing the actual screen viewport via GeometryReader in MonospaceScrollContainer (mirrors Android's `Modifier.widthIn(min = viewportLineWidth)` from NomadNetBrowserScreen.kt:474) and wrapping each scroll-mode row in `.frame(minWidth: viewportWidth, alignment: alignment.swiftUI)`. 2. Row-to-row column alignment drifted by half a cell because Core Text's `textAlignment = .center` strips trailing whitespace when computing the centered offset. Lines with a trailing space centered as if one cell narrower than lines without — visible as the letter "T" of "the chat room" wandering in the ASCII art. UILabel now always renders left-aligned (paragraphStyle and textAlignment) and visual centering is the SwiftUI .frame's job. 3. SF Mono renders Block-Elements (▗▄▖▝▀▘▙▟ etc.) at slightly different pixel widths than ASCII spaces, so 85-char rows of mixed content didn't end up the same width. Bundled JetBrains Mono (Apache 2.0/OFL, Regular + Bold, ~270KB each) for the monospace renderer — every glyph in the file has advance=600 confirmed via fontTools, matching what Android already uses (MicronComposables.kt's `JetBrainsMonoFamily`). Falls back to the system font if the bundled one fails to load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com> Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com> * fix(TCPClientWizard): mirror android server list, drop bootstrap split Addresses PR review comments: #64 (comment) #64 (comment) Replace the iOS community-server directory with the canonical Android list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt. Removes decommissioned / non-existent entries (RNS Amsterdam, RNS BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland, TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the servers that are actually present on the network. i2p is dropped entirely because iOS has no i2p transport. Also collapse the "Bootstrap Servers" / "Community Servers" split in TCPClientWizard into a single "Community Servers" section, since Reticulum-Swift does not yet implement bootstrap-interface mode and splitting them would mislead users into expecting bootstrap behavior. The isBootstrap flag on the data model is preserved so the Android table stays mirrorable. Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * feat(auto-announce): granular trigger toggles + new wiring Splits the auto-announce path into three independently-toggleable triggers, all gated behind the existing `auto_announce_enabled` master: - `auto_announce_on_interval` — periodic timer (existing) - `auto_announce_on_tcp_reconnect` — fires on TCP / RNode reconnect - `auto_announce_on_peer_spawned` — fires when AutoInterface / BLE / MPC accepts a new peer All three default true to preserve the previous "all triggers active when master is on" behaviour. Wiring: - `AppServices.configureTransportCallbacks` now uses reticulum-swift's split callbacks (`setOnInterfaceConnected` / `setOnInterfacePeerSpawned`), each with its own user-setting gate. The polled state-observer's connect-trigger is gated to match. - `AutoAnnounceManager.start` (and the in-loop re-check) honour the `auto_announce_on_interval` toggle in addition to master. - `autoAnnounce()` itself bails on master-off as defense in depth. - SettingsView's Auto Announce card grows three sub-toggles + interval picker hides when the on-interval trigger is off. Pairs with reticulum-swift's onInterfaceAdded → onInterfacePeerSpawned / onInterfaceConnected split (see that repo). Ship-ready behaviour change on its own; no diagnostic logging in this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump reticulum-swift pin to 0.2.4 Picks up the onInterfaceAdded → onInterfacePeerSpawned/onInterfaceConnected split (reticulum-swift PR #14) that this PR's wiring requires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(AppServices): only resetTimer when announce was actually sent The polled state-observer's connect path was calling `autoAnnounceManager.resetTimer()` unconditionally — even when the TCP-reconnect gate had blocked the announce. Because `resetTimer()` restarts the periodic loop with a fresh `Next auto-announce in 3h (±1h)` schedule, every TCP reconnect on a flap-y network (mobile data ↔ WiFi, RNode in poor RF) would push the next interval-announce a full interval into the future without ever emitting one. The periodic schedule could be perpetually starved even though the user left "On interval" enabled and only disabled the reconnect trigger. Move the `resetTimer()` call inside the gate so it only fires when an announce actually went out. Greptile review feedback on PR #70. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(auto-announce): extract AutoAnnouncePolicy + cover trigger gates The auto-announce trigger gates were inlined as `defaults.bool(forKey: ...)` calls at seven sites across AppServices and AutoAnnounceManager, which made them impractical to unit-test without bringing up the full AppServices stack (transport, identity, router, …). Extract the gating decision into a pure value type, AutoAnnouncePolicy, that snapshots the four UserDefaults keys and exposes: - shouldFireOnInterval - shouldFireOnTcpReconnect - shouldFireOnPeerSpawned …all derived from the master enable plus the corresponding granular toggle. Routes the seven existing call sites through the policy so the inline string-key reads no longer appear in service code (which makes a typo-rename harder and gives every gate the same code path). Tests in AutoAnnouncePolicyTests cover: - Direct init stores all four flags. - Master off suppresses all three triggers regardless of granulars. - Each granular toggle gates its own trigger independently. - All-on / all-off boundary cases. - Empty defaults reports all-off (raw read behavior). - Snapshot is immutable after capture (catches future refactors that might keep a defaults reference). - register(defaults: true) produces the fresh-install all-fire baseline that SettingsViewModel.loadLocalSettings sets up. - Explicit false overrides registered default-true. 9 tests, all passing locally on iOS Simulator. Total suite went from 71 to 80 tests; no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auto-announce): attribute peer-child connected events to peer-spawned gate Reticulum-swift fires `onInterfacePeerSpawned` when an AutoInterface / BLEInterface / MPCInterface accepts a peer, then a moment later fires `onInterfaceConnected` for the peer's child transport's `.connected` transition. The previous gating treated the second event as a generic TCP-reconnect, so a user who turned the peer-spawned toggle off but left tcp-reconnect on would still get an announce on every peer-add — defeating the purpose of having a separate peer-spawned gate. Changes: - `AutoAnnouncePolicy.shouldFireOnInterfaceConnected(isPeerChild:)` new accessor that gates by `onPeerSpawned` for peer-children and `onTcpReconnect` for everything else (both still subject to `masterEnabled`). - `AppServices` tracks ids passed through `onInterfacePeerSpawned` in a `peerChildInterfaceIds` set, then queries it in the `onInterfaceConnected` handler to pick the right gate. - Diagnostic log line distinguishes the two attribution paths so a future investigation can tell whether an announce came from the tcp-reconnect or peer-child-reconnect branch. Tests cover the four corners of the cross-trigger matrix plus the master-off override: - peer-child + peer-spawned-off + tcp-reconnect-on → does NOT fire - peer-child + peer-spawned-on + tcp-reconnect-off → fires - non-peer-child + tcp-reconnect-on / off → fires / not - master off → never fires - all-on / all-off across peer-child boundaries Greptile review feedback on PR #70 (4/5 confidence comment about peer-child overlap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auto-announce): make peer-child attribution race-free The peer-spawned and connected callbacks fire from independent reticulum-swift Tasks. The previous implementation used MainActor- isolated record / lookup, which meant both operations had to await an actor hop. Swift's task scheduler doesn't guarantee record-before-lookup ordering between unrelated Tasks, so a fast peer-add → child-connect sequence could in theory mis-attribute the connected event to tcp-reconnect instead of peer-spawned (the user-facing bug fixed in the prior commit). Replace the MainActor-isolated Set with a synchronous, lock-protected PeerChildInterfaceRegistry (OSAllocatedUnfairLock-backed). The peer- spawned closure now records on its first line, *before* any await suspension, so the record is committed before any subsequent onInterfaceConnected for the same id can possibly run its attribution lookup. The connected closure's lookup is also synchronous, so attribution is correct regardless of how the schedulers interleave the rest of the closure bodies. Tests: - PeerChildInterfaceRegistryTests: empty / record-then-contains / idempotent / reset / immediate-visibility on same thread. - testConcurrentRecordAndContainsObservesAllPriorRecords: 1000-way concurrent record+contains stress, asserts no crash and full visibility after group completes. Total suite: 90 tests, all passing. Greptile review feedback on PR #70 (4/5 confidence comment about Task ordering between MainActor hops). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(greptile): iteration 1 — applied 2, rejected 0 Snapshot dictionary keys before mutating during iteration in PacketTunnelProvider: - applyConfigsLocked() stale-entry teardown: collect stale ids via filter() before the loop instead of iterating currentTCPs.keys while teardownTCPConnectionLocked + removeValue mutate it. - wake() reaper: iterate Array(self.tcpConnections.keys) instead of the live Keys view while teardownTCPConnectionLocked mutates the same dictionary. Both paths run on configQueue (the only mutator), but Swift's Dictionary.Keys is documented as a live view and mutation during iteration is undefined behavior — can silently skip entries or crash. Both fixes are inert for the single-TCP case but matter as soon as 2+ TCPs are active and a config-change or wake event fires. Co-Authored-By: Claude opus-4-7-1m <noreply@anthropic.com> * chore(greptile): iteration 1 — applied 1, rejected 0 Roll back tcpInterfaces[entityId] and defer tcpEndpoints[entityId] until after transport.addInterface succeeds. Without this, a transient addInterface throw left both dictionary entries populated for a dead, un-attached interface; the next connectTCPInterface call with the same endpoint hit the idempotency guard at the top of the function and silently no-op'd, breaking self-healing reconnects until the user manually edited host/port. Greptile thread 2 (the matching skip in InterfaceManagementViewModel. applyChanges) is satisfied by this same fix — once tcpEndpoints reflects only successfully-applied endpoints, the VM's `tcpEndpoints[id] == desired` guard correctly distinguishes "running cleanly" from "stale dead entry waiting to retry". Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com> * chore(greptile): iteration 2 — applied 1, rejected 0 Extend the connectTCPInterface write-after-success + rollback pattern to the three remaining tcp-server init sites: both initialize() overloads and reinitializeConnection(). Without this, an addInterface throw during init left tcpInterfaces["tcp-server"] and tcpEndpoints["tcp-server"] populated with a dead interface; reconnectTCPOnly delegates to connectTCPInterface(entityId: "tcp-server", ...) which then silently no-op'd on a same-address retry through the new idempotency guard. For the two initialize overloads, the catch block preserves the "non-fatal" semantics (init proceeds without TCP, no rethrow) but now also clears the partial dictionary writes so a later reconnectTCPOnly retry isn't stuck. For reinitializeConnection — which had no catch and propagates errors to its caller — the new do/catch rolls back and rethrows, mirroring connectTCPInterface. Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com> * feat(Map): follow app dark mode for OpenFreeMap style Picks the OpenFreeMap style URL (liberty / dark) based on ThemeManager.isDarkMode and reapplies it from updateUIView when the active scheme changes. Coordinator caches the last applied URL to skip the no-op reassignment that would otherwise fire on every peer-location tick. Offline regions remain pinned to the liberty style at download time; switching to dark while fully offline yields unstyled tiles. To be addressed in a follow-up that caches both style packs. Closes #59 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * Update Sources/ColumbaApp/Views/Map/MapLibreMapView.swift Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * chore(greptile): iteration 1 — applied 4, rejected 0 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * feat(InterfaceManagement): add TCP client community-server wizard (#64) * feat(InterfaceManagement): add TCP client community-server wizard Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding add-interface surface: server selection (bootstrap/community/custom) → review & configure. Routes Settings → Network Interfaces → + → TCP Client through the wizard instead of the blank manual entry sheet, and reroutes edit-existing for TCP entries to the same flow with pre-filled values. Scoped to the fields TCPClientConfig already supports (host, port, networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred. Closes #51 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * fix(TCPClientWizard): mirror android server list, drop bootstrap split Addresses PR review comments: #64 (comment) #64 (comment) Replace the iOS community-server directory with the canonical Android list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt. Removes decommissioned / non-existent entries (RNS Amsterdam, RNS BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland, TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the servers that are actually present on the network. i2p is dropped entirely because iOS has no i2p transport. Also collapse the "Bootstrap Servers" / "Community Servers" split in TCPClientWizard into a single "Community Servers" section, since Reticulum-Swift does not yet implement bootstrap-interface mode and splitting them would mislead users into expecting bootstrap behavior. The isBootstrap flag on the data model is preserved so the Android table stays mirrorable. Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * chore(greptile): iteration 1 — applied 4, rejected 0 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> * fix(TcpCommunityServer): remove unwanted servers from wizard list The following entries should not be surfaced in the on-device wizard: - interloper node + interloper node (Tor) - Jon's Node - Quortal TCP Node - R-Net TCP - RNS bnZ-NODE01, RNS COMSEC-RD, RNS HAM RADIO - RNS Testnet StoppedCold - RNS_Transport_US-East - Tidudanka.com Surviving list: 3 bootstrap-class (Beleth RNS Hub, Quad4 TCP Node 1, FireZen) + 7 community (g00n.cloud Hub, noDNS1, noDNS2, NomadNode SEAsia TCP, 0rbit-Net, Quad4 TCP Node 2, SparkN0de). NOTE: the file's docstring claims this list mirrors Android's `TcpCommunityServer.kt`. Pruning here breaks that mirror; a follow-up PR should make the equivalent removal on the Android side, OR the "keep in sync" claim should be relaxed to "originally derived from." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com> Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com> Co-authored-by: torlando-agent[bot] <torlando-agent@noreply.github.com> * feat: add Maestro UI flows for columba-suite ui-screenshotter (#69) * feat: add Maestro UI flows for columba-suite ui-screenshotter agent Adds flows/ with 4 deterministic Maestro flows (contacts-list, chats-list, settings, map) plus a README. The columba-suite ui-screenshotter agent captures each flow at BASE_REF and HEAD in both light and dark Simulator appearances on every UI-touching PR, linking the resulting PNG pair from PLAN.md so reviewers see the visual change before merging. This PR exists primarily to land flows/ on main so subsequent PRs have flow coverage at BASE_REF. The screenshotter will fire on this PR itself, but cleanly skip with screenshot_status: skipped_no_flows because the PR's BASE_REF (this branch's parent) doesn't yet have flows/. Voice-call flows are deferred — they need a debug-only lxma://debug/... URL handler that doesn't exist yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(greptile): iteration 1 — applied 1, rejected 2 Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com> --------- Co-authored-by: torlando-agent[bot] <217870594+torlando-agent[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com> * chore(test): add debug-only iOS test surface for phone smoke-test pipeline Mirror of the Android `app/src/debug/.../TestController.kt` + TestReceiver.kt surface, adapted to iOS via a sibling URL scheme (`lxma-test://`) routed through the existing `.onOpenURL` handler in ColumbaApp.swift. The 17 actions, log shape (`event=key=value`), and whitespace-escape rules match Android byte-for-byte so the python orchestrator's regexes work cross-platform. - Sources/ColumbaApp/Test/TestController.swift — singleton coordinating the test-action surface; binds to live AppServices/router/interface repository, observes inbound LXMF + delivery-state via a relay delegate, emits structured os_log lines under subsystem `network.columba.app.test` / category `harness` so idevicesyslog filters cleanly. - Sources/ColumbaApp/Test/TestURLHandler.swift — `lxma-test://<action>?<query>` dispatcher; mirrors Android's TestReceiver `when (action)` switch, routes to TestController. Wired into ColumbaApp.swift's `.onOpenURL` with a `#if DEBUG` guard. - Both files are wrapped in `#if DEBUG` so they compile out of release `.ipa`s. Defense in depth: every entry trips an `assertionFailure` with a release-misconfig message. Verified empirically — release build's binary contains zero references to TestController / TestURLHandler / harness log strings. - `lxma-test` URL scheme registered in Info.plist alongside `lxma`. The scheme stays present in release builds (no per-config plist on this project) but is harmless because no code in release handles it; the release `.onOpenURL` `#if DEBUG` block compiles to a guard-pass and the URL falls through. The Python orchestrator at ~/.claude-runner/columba-harness/smoke_test_ios.py drives this surface end-to-end (devicectl URL dispatch + idevicesyslog tail) and is the iOS sibling of smoke_test.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test-harness): unbreak release-guard + add file-based event log Two bugs that prevented end-to-end smoke runs against a physical iPhone: 1. assertionFailure_releaseGuard() was calling assertionFailure(...) UNCONDITIONALLY in both TestController.swift and TestURLHandler.swift. That's exactly inverted from the intent — `assertionFailure` ALWAYS crashes in DEBUG builds. So every URL dispatch and every public handler entry crashed the app on the guard before any logic ran. Mirrors the Android side's `check(BuildConfig.DEBUG)` semantics: crash only when DEBUG is FALSE. New impl wraps the body in `#if !DEBUG ... #endif` so it's a no-op in normal debug builds and a hard crash if a release ever gets misconfigured to compile this file in. 2. TestLog.emit() now ALSO writes each line to `Documents/test_log.txt`, prefixed `seq=<n> ts=<iso8601>`. Reason: the Python orchestrator originally tailed device syslog via `idevicesyslog`, but iOS 17+ moved live-syslog behind the new CoreDevice / RemoteXPC tunnel that libimobiledevice can't speak. `pymobiledevice3` would work but needs a developer-tunnel daemon. The orchestrator now polls Documents/test_log.txt via `xcrun devicectl device copy from --domain-type appDataContainer`, which works out of the box and is more robust (no race window, survives disconnects). os_log writes are kept for human readers. Verified end-to-end: smoke_test_ios.py runs the propagated_bidirectional scenario all the way through interface setup, propagation-node config, HAS_PATH=1, SEND_PROP, msg_sent. (Stalls at OUTBOUND-never-advances-to- PROPAGATED — separate LXMFSwift outbound state-machine issue, NOT a harness bug. Diagnostic for that lands in a follow-up.) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(harness): add lxma-test://dump_log for OSLogStore extraction iOS 17+ moved live syslog behind the new CoreDevice / RemoteXPC tunnel that libimobiledevice can't speak, so the smoke harness couldn't observe library-internal events on the device. Added a debug-only `dump_log` URL action that uses OSLogStore to extract recent unified-log entries from the app process and forwards them into Documents/test_log.txt as `lib_log subsys=… cat=… level=… msg=…` lines that the orchestrator can parse with its existing devicectl copy-from poll mechanism. Filter defaults to `(com.columba.core, net.reticulum.lxmf)` × (Propagation, Sync, LXMRouter, Stamper, Identity, PropagationNodeManager) to surface just the propagation-path observability we need to diagnose stuck `state=OUTBOUND` failures. `?since=<sec>` sets the window (default 120s); `?cat=<comma>` overrides categories; `?cat=*` disables category filtering. Critical first finding when wired up: processOutbound IS running and calling sendPropagated; the failure is `LXMRouter` emitting "Delivery failed: No path available to destination, retrying in 15s/120s" because `pathTable.lookup(destinationHash: nodeHash)` returns nil for the propagation node hash even though `pathTable.hasPath(for:)` returns true on the same hash from the harness. Likely actor- isolation race or stale-snapshot bug in the path-table view; needs deeper investigation in LXMF-swift / reticulum-swift. Sticks to existing test-surface contract — `lib_log_done count=<n>` / `lib_log_err reason=<msg>` reply tokens; debug-only via the existing `#if DEBUG` source-set isolation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(harness): wire iOS PROPAGATED smoke end-to-end Three bug-fix-and-instrument changes to make the PROPAGATED self-send round-trip pass on iOS. Mirrors the Android smoke pipeline shipped in PR #882. 1. TestRelayDelegate retention. LXMRouter holds the delegate weakly (LXMRouter.swift `weak var delegate`); attachDelegate handed in a stack-local relay that immediately deallocated, leaving the router with a nil delegate and no didUpdateMessage callbacks for outbound state changes. Pin the relay to TestController.attachedDelegate. 2. set_prop_node now goes through PropagationNodeManager.selectNode (via TestPathBridge.selectPropNode) instead of router.setOutboundPropagationNode. The manager is the only path that wires the announce-derived stamp cost into the router; the bare router setter left cost=0 and sendPropagated shipped a random stamp that lxmd rejected with ERROR_INVALID_STAMP. selectNode also now (a) reads stamp cost from pathTable.appData when knownNodes is empty and (b) waits up to ~5s for either source to populate, covering the smoke-test race where set_prop_node fires immediately after add_tcp_client (before the announce arrives). 3. PropagationNodeManager.processPathEntry re-applies the stamp cost to the router whenever an announce updates the currently-selected node, so a delayed announce can correct an earlier cost=0 setting. Plus instrumentation: dump_log now emits each OSLog entry's actual recorded timestamp (`entry_ts=`) alongside the dump-time `seq=N ts=` prefix, and includes `network.columba.Columba` in the allowed-subsystem set so app-side managers (PropagationNodeManager) show up. Direct + opportunistic self-send scenarios are still WIP — they require LXMRouter-level loopback for self-addressed packets (single device can't actually transit a packet to itself through the network) which is a future stage. PROPAGATED works today via the lxmd round-trip. * chore: bump LXMF-swift to a3e5b00 (DIRECT identify-drop fix) * chore(deps): pin reticulum-swift to fix/link-data-no-header2-conversion reticulum-swift @ d19919a — drops incorrect HEADER_2 conversion of link DATA packets that broke multi-hop DIRECT delivery (state=SENT but the echo bot never received the message). Mirrors python RNS/Transport.py :1063, 1122-1130 — link DATA always sends HEADER_1 to the link's attached_interface, never through path-table lookup. LXMF-swift @ fe3ce84 (perf/stamper-parallel-primed-digest) — pins reticulum-swift to the same fix branch. Smoke results after fix (today's run #5): propagated_bidirectional: PASS (6.7s) direct_echo: PASS (3.5s) ← was FAIL pre-fix opp_echo: PASS (3.4s) * test(harness): add diagnostic ticker + screenshot capture to TestController Spawned by TestController.bind() on first init; runs every 2s for the app's lifetime, snapping the key window into Documents/screenshots/<seq>.png and emitting: diag_tick seq=N state=<active|inactive|background> snapshot=<path|<skip>> lifecycle event=<did_become_active|will_resign_active|...> Diagnoses the iOS smoke harness wedge: "lxma-test:// URLs stop reaching the URL handler after 2-3 sequential runs." The ticker is driven by an internal Task, NOT URL dispatch, so it keeps emitting even when URLs are wedged. If ticks ALSO stop, the OS suspended/killed the app. If ticks keep coming with state != .active, the app went background. If ticks keep firing AND state stays .active but URLs still don't reach the handler, the wedge is below SwiftUI (CoreDevice tunnel / launch services). Last is the smoking gun pattern. Field finding from this commit's first run (2026-05-10): iter 1: 3/3 PASS iter 2: 3/3 PASS iter 3: 0/3 FAIL — "TCP client interface ADD never confirmed" iter 4: total wedge — TestController never answered get_dest After the wedge, even `devicectl device copy from` hangs for 30+s, which proves the wedge is at the **CoreDevice tunnel layer**, not the app's URL handler. The iPhone-side dev tunnel (RemoteServiceDiscovery) goes degraded after rapid `process launch --payload-url` bursts. Recovery: pkill devicectl + relaunch app via process launch (which still works because process control rides a different RSD service). Screenshots written to Documents/screenshots/, capped at 30 most-recent. Pull via `xcrun devicectl device copy from --domain-type appDataContainer --domain-identifier network.columba.Columba --source Documents/screenshots --destination /tmp/...`. #if DEBUG-only — does not ship in release, same as the rest of the test surface. * fix(prop): single checkmark + 'sent to relay' text + dump_db diag LXMF-swift bump → b2e14cd: caps PROPAGATED outbound state at .sent (per python LXMessage.py:568-578); large prop messages no longer falsely advance to .delivered via the Resource path. iOS UI: - MessageBubble.deliveryStatusIcon: defensively coerce delivered/read → sent for any message with deliveryMethod == 'propagated' (handles stale rows from before the fix). - MessageDetailView.statusCard: method-aware text for prop messages. 'Sent' → 'Sent to relay' with subtitle explaining propagation nodes don't ack recipient receipt. Diagnostic surface: - New lxma-test://dump_db URL action. Walks the full conversations + messages tables, emits one line per row to test_log.txt. Diagnoses Tyler's 2026-05-10 observation that prop messages appear in a separate conversation from direct/opp — DB inspection is the source of truth (UI faithfully renders whatever conversations table has). Refs: - LXMF/LXMessage.py:568-578 (__mark_propagated → state=SENT) - LXMF-swift b2e14cd (resource-handler split, port-aligned) * chore(deps): bump LXMF-swift to 0.4.0 + reticulum-swift to 0.3.0 LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged): - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone. - PROPAGATED state machine fixes: drops wrong link.identify(); wires RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler via pendingPropagationSends FIFO + pendingPropagationRejections set; handlePropagationAccepted + handleOutboundResourceFailed with awaited DB writes that preserve deliveryAttempts budget. - DIRECT path: self-send identity resolution before path table; drops premature link.identify(); broadcast-relay-only self-echo gate; DIRECT resource crash-recovery parity with PROPAGATED. - Stamp-rejected resource short-circuit prevents retry-loop spam. reticulum-swift 0.3.0 (PR #16): - HEADER_2 link DATA conversion fix. - sendLinkData signature: destinationHash param removed (breaking). Package.swift, pbxproj, and Xcode-shared Package.resolved all updated. Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO, BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional with Mac echo bot) to follow on PR ready→draft transition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump LXMF-swift to 0.4.0 + reticulum-swift to 0.3.0 (#73) LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged): - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone. - PROPAGATED state machine fixes: drops wrong link.identify(); wires RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler via pendingPropagationSends FIFO + pendingPropagationRejections set; handlePropagationAccepted + handleOutboundResourceFailed with awaited DB writes that preserve deliveryAttempts budget. - DIRECT path: self-send identity resolution before path table; drops premature link.identify(); broadcast-relay-only self-echo gate; DIRECT resource crash-recovery parity with PROPAGATED. - Stamp-rejected resource short-circuit prevents retry-loop spam. reticulum-swift 0.3.0 (PR #16): - HEADER_2 link DATA conversion fix. - sendLinkData signature: destinationHash param removed (breaking). Package.swift, pbxproj, and Xcode-shared Package.resolved all updated. Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO, BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional with Mac echo bot) to follow on PR ready→draft transition. Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tunnel): guard applyTunnelModeToInterfaces(active:false) against initial .invalid VPN state iOS emits an `.invalid` / `.disconnected` VPN status notification on every cold start — fired by `TunnelManager.onStatusChange` regardless of whether the user has enabled Background Transport, because the session machinery probes whatever is currently loaded. The previous code unconditionally scheduled `applyTunnelModeToInterfaces(active: false)` via the 5s debounce, which iterated every TCPInterface and called `endTunnelMode()`. `endTunnelMode()` in reticulum-swift 0.3.0 is NOT idempotent (TCPInterface.swift:257-269): it unconditionally tears down the working NWConnection (via `transport?.disconnect()` -> nil) and re-runs `setupTransport()`. Calling it on an interface that was never in tunnel mode (outboundHook == nil) is destructive — it kills the live socket Step 7 brought up moments earlier. Reproduced 2026-05-11 on smoke run iter1 against `feat/multi-tcp-tunnel @ 0f7cf3e`: all 4 scenarios FAILED at the earliest `send_*` step. has_path returned 1 for both PN and bot (path table populated via inbound announces), but outbound sends never advanced past `state=OUTBOUND`. Console showed `[TUNNEL] disabled tunnel mode` ~5s after cold start with no prior `[TUNNEL] enabled` line, confirming the debounce was tearing down TCP without ever having activated it. Fix tracks an `isTunnelModeActive` bool. The active=false branch guards on it and returns early if tunnel mode was never activated. Mirrors the "undo what you did" contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com> Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com> Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: torlando-agent[bot] <torlando-agent@noreply.github.com> Co-authored-by: torlando-agent[bot] <217870594+torlando-agent[bot]@users.noreply.github.com>
…ip-flag # Conflicts: # Columba.xcodeproj/project.pbxproj # Sources/ColumbaApp/Models/TcpCommunityServer.swift # Sources/ColumbaApp/Services/AppServices.swift
Smoke status: 4/4 PASS (smoke_clean) — backgrounded delivery gate metAfter merging current
Identical green shape to PR #62's final iter 3 — main-into-branch merge was functionally inert. The This branch now passes the Phase 3 (backgrounded delivery) smoke gate. PR is review-ready. 🤖 Generated with Claude Code |
Smoke status update: 5/5 PASS — Phase 4 doze gate also cleanRe-ran the smoke harness with a new
Could be either (a) NE kept the TCP socket alive through the 5min suspension, or (b) iOS killed the connection and the app re-established it + ran 🤖 Generated with Claude Code |
…orkaround reticulum-swift 0.3.1 (PR #17) makes `TCPInterface.endTunnelMode()` and `AutoInterface.endTunnelMode()` idempotent via an `outboundHook != nil` guard. That moves the contract upstream, so the `isTunnelModeActive` bool guard added in `c0d2213` is no longer necessary — the `endTunnelMode()` calls in `applyTunnelModeToInterfaces(active: false)` are now safely no-ops when fired on never-tunneled interfaces (e.g. the initial `.invalid` VPN-status notification on every cold start). Removed: - `isTunnelModeActive` field declaration + doc - `isTunnelModeActive = true` write in the active=true branch - `isTunnelModeActive = false` write in the active=false branch - The `guard isTunnelModeActive else { return }` short-circuit Build verified: xcodebuild for iOS Simulator BUILD SUCCEEDED. The port-deviations.md note for reticulum-swift's tunnel API spelled out that this Columba-iOS workaround should be deleted on the next deps bump — this is that deps bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d alone is insufficient
Smoke iter 1 against `1ee72eb` (reticulum-swift 0.3.1 bump + workaround
removal) failed all 5 scenarios with the same OUTBOUND-forever shape
the workaround was suppressing. Reverted just the Columba-side
workaround removal here as an A/B test — same HEAD otherwise (0.3.1
deps preserved). If smoke goes 5/5 again on this commit, it proves
0.3.1's upstream `outboundHook != nil` guard is necessary but not
sufficient; the Columba workaround was suppressing something the
upstream check doesn't catch.
Diag.log from the failing iter shows `[TUNNEL] disabled tunnel mode`
fires at +5s cold-start (the .invalid debounce expiring) but Step 7
reports "starting 0 enabled interfaces" before that, meaning
`tcpInterfaces` is empty when the disable iterates. So whatever the
workaround was suppressing isn't `endTunnelMode()` being called on
live interfaces — it's something else in the same code path or a
related side effect. Investigation continues; the workaround stays
in until the actual mechanism is identified.
This restores the `isTunnelModeActive` field, the `= true` write in
the active branch, and the `guard isTunnelModeActive else { ... }`
short-circuit in the inactive branch. reticulum-swift 0.3.1 is kept
(`Package.swift` / pbxproj minimumVersion / Package.resolved
unchanged) — the upstream guard is still a correctness improvement
even if it isn't load-bearing for this specific Columba bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update: bumped reticulum-swift 0.3.1, kept
|
| HEAD | reticulum-swift | Columba workaround | smoke |
|---|---|---|---|
ed89272 |
0.3.0 | present | 5/5 PASS |
1ee72eb |
0.3.1 | removed | 0/5 FAIL |
2ff0d10 |
0.3.1 | restored | 5/5 PASS |
Decision
Keep the Columba-side workaround. The 0.3.1 dep bump is retained (it's a real correctness improvement at the API surface even if it isn't load-bearing for this specific bug). Investigation into what the workaround suppresses beyond the upstream guard is filed as a follow-up — suspects include the late-tunnel-check in connectTCPInterface, a side-effect of disable_all_interfaces, or a NEVPNStatusDidChange race during interface bringup.
PR head is now 2ff0d10 and smoke-clean. Ready for review/merge.
🤖 Generated with Claude Code
…ification scenario Adds `lxma-test://get_notifications` URL action that queries `UNUserNotificationCenter.deliveredNotifications` and emits one `notif id=<id> thread=<id> delivery_ts=<iso> source_hash=<hex> body=<preview>` line per delivered notification, bracketed by `notif_begin count=N query_ts=<iso>` and `notif_end count=N`. Used by the Phase A `suspended_notification` smoke scenario (in `smoke_test_ios.py`) to assert whether a system-level notification was posted while the app was suspended: compare each notification's `delivery_ts` against the orchestrator's `T_foreground` wall-clock to distinguish "delivered during suspension" (notification fired from the extension, the goal) vs. "delivered post-foreground" (app caught up by draining the queue, what the current "dumb pipe" NE architecture produces). The scenario is expected to FAIL on the current branch — that failure IS the gate signal that Phase B (push destination-hash filter + UNUserNotificationCenter call into ColumbaNetworkExtension) hasn't shipped yet. Phase A's purpose is exposing the gap that the existing smoke obscured by foregrounding before checking the DB. Build verified: xcodebuild iOS Simulator BUILD SUCCEEDED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s + request_notif_permission) Phase A smoke iter 3 showed `suspended_notif_count=0` AND `post_foreground_notif_count=0` — ambiguous between "Phase B not done" (expected catch-up-on-drain) and "iOS notification permission not granted on test iPhone." Adding two new test-surface actions to disambiguate up-front: - `lxma-test://get_notif_status` — emits current iOS authorization state (`notDetermined` / `denied` / `authorized` / `provisional` / `ephemeral`) plus alert/badge/sound flags AND Columba's own `notifications_enabled` UserDefaults pref. Lets the scenario detect the permission-missing branch and fail with a clear `iOS notification permission not granted (auth=…)` message instead of "no notifications, cause unknown." - `lxma-test://request_notif_permission` — calls `UNUserNotificationCenter.requestAuthorization` (iOS shows the system "Allow notifications?" prompt on first run) AND sets `notifications_enabled` + `notify_received_message` to true in UserDefaults so `NotificationService.postMessageNotification` won't short-circuit on the pref guard. First run after a fresh install: orchestrator drives this URL, iOS shows the system prompt, Tyler taps Allow once on the phone. From then on the grant is persisted and the scenario runs unattended. Build verified: xcodebuild iOS Simulator BUILD SUCCEEDED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A — Suspended-notification gap confirmedAdded a new
Architectural gap confirmed: zero notifications fire during the 90s suspension window in both runs. Matches Apple's documented behavior — Separate concern surfaced (filed as task #99): the post-foreground sync_prop drain isn't reliable after a Safari-foreground/Columba-foreground transition — TCP interface state may be degrading across the suspend cycle. Not blocking Phase B planning since the gap signal is independent. Next: Phase B — push minimal dest-hash filter + 🤖 Generated with Claude Code |
… extension Phase B of the suspended-app notification work. With Darwin notifications unable to wake a suspended host app (Apple DTS forum 769398), `NotificationService` never fired on inbound LXMF traffic until the user manually foregrounded the app. This commit moves the minimum amount of Reticulum awareness into the `NEPacketTunnelProvider` to fix that gap: - `AppServices.publishLocalDestinations()` writes the `transport.registeredDestinationHashes()` set to App Group prefs and posts a Darwin reload notification. Called at the end of both `initialize` overloads and after `initializeBaseStack` — every place where a destination is freshly registered. `switchIdentity` delegates to the second `initialize` overload so identity switches are covered too. - `PacketTunnelProvider` decodes the published hex hashes into a `Set<Data>` on `configQueue`, observes the reload Darwin notification, and consults the set in `handleTCPData` for every deframed packet. Matching packets get an `UNUserNotificationCenter` request posted under the host app's bundle identity so iOS shows a banner / lock-screen alert even while the host app is suspended. The filter inspects only unencrypted header fields (header type byte + destination_hash at offset 2 for HEADER_1 or offset 18 for HEADER_2, verified against `Reticulum/RNS/Packet.py:Packet.unpack`); crypto and full LXMF decode stay in the host app. Fires on DATA+CONTEXT_NONE (OPPORTUNISTIC LXMF arrivals) and LINKREQUEST (DIRECT delivery initiation, the only DIRECT-flow packet addressed to our delivery hash); ANNOUNCE and PROOF are skipped. Notifications inherit the host app's authorization grant — extensions sit in the container app's notification domain (Apple DTS engineer Quinn) — so no extension-side `requestAuthorization` is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `suspended_notification` smoke scenario needs the Network Extension running across the host-app suspend window — otherwise the inbound TCP socket dies as soon as iOS suspends the host process and Phase B's destination filter never sees a frame. Adds two `lxma-test://` actions that the harness can call to flip Background Transport on programmatically (matching the Settings toggle's behaviour byte-for-byte: it persists `tunnelEnabledKey` so a cold restart auto-resumes the tunnel, then kicks `TunnelManager.start()` and waits up to 30s for `.connected`). - `enable_tunnel` — emits `tunnel_enable state=<state>` - `get_tunnel_status` — emits `tunnel_status state=<state>` `TestTunnelBridge` keeps the test surface ignorant of `TunnelManager`'s real type (it only exists under `ENABLE_NETWORK_EXTENSION`), so the file still compiles in build configurations where the extension is turned off. The bridge closure lives in `TestURLHandler.bind`, guarded by the same compile flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Settings toggle persists `tunnelEnabledKey` so a cold relaunch auto-restarts the tunnel — that's the right shape for users. For the test surface it's wrong: every subsequent smoke run cold-starts with auto-tunnel-on, and the in-flight transition races the harness's path-discovery bringup and breaks even baseline scenarios (msg stays in OUTBOUND, never reaches PROPAGATED). Drops the persistence write inside the test bridge. `TunnelManager.start()` still runs and the tunnel is alive for the rest of the session, so the suspend test still gets what it needs. Tests that need the tunnel call this every run; the persisted flag stays off across runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion prompt When the user hasn't responded to the system notification prompt yet, `UNUserNotificationCenter.requestAuthorization` doesn't return until they tap Allow / Don't Allow — and awaiting that during the cold-start init loop held the rest of init hostage. Concretely: no `TestURLHandler.bind`, no MainTabView, no `isInitialized = true`. The app's loading screen stays up indefinitely behind the system sheet. Fire-and-forget the permission request so the rest of init can proceed in parallel. Users still see the prompt the first time they launch a fresh install — they just don't need to dismiss it before the app becomes usable. The matching `userNotificationCenter.delegate` assignment is part of `requestPermission()`, so it's still installed (asynchronously) and foreground notification suppression for the active conversation continues to work the next message after grant. Also unblocks the smoke harness on fresh-install devices — it previously got stuck at `dest_err reason=not_ready` because TestController.bind never ran while the OS prompt was up and there's no way for devicectl to tap "Allow" remotely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `lxma-test://skip_onboarding[?host=&port=&name=]` so a fresh install (or any device where `has_completed_onboarding` is false) can be brought to the smoke-testable state without manual tap-through of the OnboardingView. Mirrors `OnboardingViewModel.skipOnboarding` exactly: creates an anonymous identity via `IdentityManager`, switches to it, registers a TCP-client interface, and flips `has_completed_onboarding` + `settings_initialized` + `notifications_enabled`. Self-contained in `TestURLHandler` (not `TestController`) because `TestController.bind` requires `AppServices` to be initialized, and that hasn't happened yet on a fresh install — the test surface needs to bootstrap state *before* AppServices has anything to bind to. `IdentityManager` and `InterfaceRepository` are both safe to instantiate standalone, so this works during the OnboardingView's lifetime. Idempotent: if an active identity already exists, no-ops on identity creation and just reaffirms the onboarding flags + TCP config. The host app's `@State showOnboarding` is decided at init time so the caller must force-terminate + relaunch the app after this returns ok before the new state takes effect. Default host/port match the columba-harness defaults (10.0.0.145:4242, name=test_mac). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iewModel.loadLocalSettings The user-facing default-value registration for `auto_announce_enabled`, `auto_announce_on_tcp_reconnect`, `notifications_enabled`, etc., was inside `SettingsViewModel.loadLocalSettings()` — which only runs when the user opens the Settings UI. Fresh installs that never visit Settings silently had every one of those keys defaulting to `false` at the raw `UserDefaults.bool(forKey:)` level, because `register(defaults:)` had never run. Concrete symptom: `AppServices.configureTransportCallbacks`'s `onInterfaceConnected` hook calls `AutoAnnouncePolicy.current()`, sees `masterEnabled = false`, and logs `[AUTO_ANNOUNCE] onInterfaceConnected(...) — master toggle off, skipping`. The phone never auto-announces on TCP reconnect, so rnsd loses the phone's path the moment the TCP socket cycles. From the bot side, this manifests as `Got packet in transport, but no known path to final destination <phone-hash>. Dropping packet.` Every bot→phone DIRECT/OPP delivery silently drops because rnsd has nothing to route to. Lifts the registration to a static `SettingsViewModel.registerLocalDefaults(into:)` method, called from `ColumbaApp.init()` before `AppServices` reads any of the keys. `register(defaults:)` only sets fallbacks for keys without explicit values, so it remains harmless to call from `loadLocalSettings()` too (which still does, so the view is self-sufficient in isolation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t (task #96)
iOS routinely reports the VPN-status sequence
`.connecting → .connected → .reasserting → .connected` during routing
setup, which fires our `onStatusChange` handler twice for `.connected`.
The `active: false` branch already guarded against this via
`isTunnelModeActive`; the `active: true` branch did not. Each redundant
`.connected` callback then called `beginTunnelMode` on every
`TCPInterface` again, which re-installs the outbound hook and resets
the transport pointer — racing any in-flight LXMessage send. The
visible symptom in the diag is the matching pair:
[TUNNEL] enabled tunnel mode on N TCP interface(s); ...
[TUNNEL] enabled tunnel mode on N TCP interface(s); ...
logged twice within the same second, followed by the LXMF state
machine stalling (e.g. a queued DIRECT send sits in OUTBOUND for
30+s before the bot eventually receives the LINKREQUEST).
Symmetric guard with the disable branch: bail with a noisy diag log
on the redundant `.connected` event.
Verified on-device: after this commit the diag shows exactly one
`[TUNNEL] enabled tunnel mode` followed by `[TUNNEL] skipping enable
— already active` for each tunnel-up transition, instead of two
unguarded enables. Mid-session `enable_tunnel` test-action call
behaves predictably afterwards.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the old tunnel-mode flip (where a single TCPInterface gave
up its app-owned NWConnection mid-session to route through the NE)
with a separate `TunnelTCPInterface` registered alongside the
foreground TCPInterface. Both connect to the same rnsd; rnsd sees
two clients with independent paths to <phone-hash>.
The old architecture had a fatal seam at the foreground-to-tunnel
handoff: the app-owned socket closed, rnsd removed the path entry
attached to it, and the extension's new socket had no announce yet
because `TCPInterface.beginTunnelMode` keeps state=.connected (no
notifyStateChange, so auto-announce-on-tcp-reconnect doesn't fire).
Bot→phone packets then bounced as `Got packet in transport, but no
known path to final destination <phone-hash>` for the entire suspend
window — Phase B's filter had nothing to fire on.
New architecture:
* `Sources/ColumbaApp/Services/TunnelTCPInterface.swift` — new
`NetworkInterface` implementation. Outbound: HDLC-frames data and
calls `TunnelManager.sendFrame(..., entityId=TUNNEL_TCP_INTERFACE_ID)`.
Inbound: receives via `ExtensionFrameReader`'s
`onTCPFrameReceived` callback when the tag matches.
* `AppServices.registerTunnelInterface()` — fires on tunnel
.connected. Creates and registers the TunnelTCPInterface,
mirrors the foreground TCP's host/port, publishes the endpoint
to a new App Group key `tunnelTCPEndpointsKey`, then sends
`sendAllAnnounces` (broadcast) followed by a 100ms-delayed
tunnel-only re-announce. The tunnel-only follow-up pins rnsd's
last-write-wins path table to the tunnel socket so the
foreground socket dying on suspend doesn't strand the path.
* `AppServices.deregisterTunnelInterface()` — fires on
.disconnected / .invalid (5s debounce). Removes the interface
from the transport and clears the App Group endpoint list.
* `PacketTunnelProvider.loadInterfaceConfigs` — reads
`tunnelTCPEndpointsKey` first. When present + non-empty, it's
the only source of TCP entries; otherwise falls back to the
legacy `interfacesKey` TCP parsing (preserves the multi-TCP
tunnel commit's behaviour for older builds). Adds a Darwin
observer for the matching `tunnelTCPEndpointsChangedNotificationName`
so the extension reapplies without a tunnel restart.
* `AppServices.connectTCPInterface` — no longer calls
`beginTunnelMode` on newly-added foreground interfaces. They
stay foreground-only. `applyTunnelModeToInterfaces(active:)` is
orphaned (no callers); left in place for now alongside the
`isTunnelModeActive` guard until a follow-up gut.
* `AppServices.ExtensionFrameReader.onTCPFrameReceived` — only
frames tagged `TUNNEL_TCP_INTERFACE_ID` route into the
transport. Frames from any other entity ID get dropped, since
the foreground TCPInterfaces receive their own inbound via
their app-owned NWConnection — accepting the extension's
duplicate would double-process every packet.
Verified end-to-end: the suspended_notification smoke scenario
posts a DIRECT message from a Mac-side pinger to the phone every
10s. When the host app is backgrounded, the tunnel TCP socket
stays alive, rnsd routes the ping via the tunnel path, the
extension receives the LINKREQUEST + DATA packets, and
`maybeScheduleNotification(for:)` matches each against
`localDestinationHashes` and posts a UN notification. Result:
`suspended_notif_count: 2` during a 30s suspend window. Phase B
is now validated end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`testEmptyDefaultsReportsAllOff` assumed the registration domain was per-`UserDefaults`-instance — that creating a per-suite scratch defaults would isolate it from `register(defaults:)` calls made on `.standard`. That assumption broke at `dc1024b` when the `SettingsViewModel.registerLocalDefaults` call moved to `ColumbaApp.init()` so the on-reconnect announce fires for fresh installs that never touch Settings. The XCTest host loads the @main App before running tests, so `ColumbaApp.init` executes and registers the four `auto_announce_*` toggles to `true`. NSUserDefaults' registration domain is shared across every UserDefaults instance in the process — including `UserDefaults(suiteName:)` scratch defaults — so the per-test suite inherits the fallbacks. Renamed the test to `testEmptyPerSuiteInheritsProcessWideRegistrationAsAllOn` to reflect the actual contract being pinned: the app-init registration *must* leak to all UserDefaults instances, because that's exactly what makes the on-reconnect announce fire on a fresh install. A future refactor that drops the app-init registration call now fails this test loudly instead of silently regressing every fresh install to no-auto-announce. The two adjacent tests (`testRegisterDefaultsTrueProducesAllFireForFreshInstall`, `testExplicitFalseOverridesRegisteredDefaultTrue`) still validate the per-instance `register(defaults:)` mechanics + explicit-write override semantics on the per-suite. Together the three tests cover the full registration-domain contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harden NE-side port parsing in `loadInterfaceConfigs`. Both the dual-interface (`tunnelTCPEndpointsKey`) path and the legacy fallback (`interfacesKey` `tcpClient` entries) used the trapping `UInt16(_:)` initializer to coerce JSON `Int` ports. If corrupted App Group data or a future writer hands an out-of-range value to either path, the NE process traps and the VPN terminates. Switch both call sites to `UInt16(exactly:)` with an early-`continue` / failed-binding — same behavior for legitimate 0…65535 ports, strictly safer for invalid input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coalesce LINKREQUEST notification retries into a single banner. LXMF's `LXMRouter` retries DIRECT delivery up to `MAX_DELIVERY_ATTEMPTS = 5` times spaced `DELIVERY_RETRY_WAIT = 10s` apart (`LXMF/LXMRouter.py:2654`). Each retry constructs a fresh `RNS.Link(...)`, which on initiator construction sends a new `LINKREQUEST` packet (`Reticulum/RNS/Link.py:308-324`). `PacketTunnelProvider.maybeScheduleNotification` matches LINKREQUEST as the DIRECT-flow signal that a new message is on its way, so without coalescing a single undelivered DIRECT delivery produces 1–5 separate "New message" banners on the lock screen. Switch LINKREQUEST notifications to a static `ext-linkreq-<destHashHex>` identifier so iOS replaces the prior pending banner on each retry. `DATA`-path (OPPORTUNISTIC) notifications keep their timestamp suffix because each represents an independently delivered message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dedupe NE placeholder notifications when host app fires its rich notification. When the host app is background-running (not yet suspended), both notification paths are live for the same arriving packet: the extension's `ExtensionNotifications.postMessageArrived` posts a generic "New message" banner keyed on the recipient's destination hash, and the host app's `NotificationService.postMessageNotification` posts a rich per-conversation banner keyed on the LXMF message hash. Without dedupe the user sees two banners for one message. Add `removeExtensionPlaceholders(forDestinationHashHex:)` and call it just before adding the rich `UNNotificationRequest`. The helper fetches pending + delivered notifications and removes any whose identifier matches the two formats used by `PacketTunnelProvider.swift`: * `ext-<destHashHex>-<timestamp-ms>` (DATA / OPPORTUNISTIC) * `ext-linkreq-<destHashHex>` (LINKREQUEST / DIRECT) `UNUserNotificationCenter` only supports exact-match removal, so we filter pending/delivered lists in-process by prefix and pass exact ids to `removePendingNotificationRequests(withIdentifiers:)` / `removeDeliveredNotifications(withIdentifiers:)`. Also resolves out-of-scope thread already filed as #74 (multi-relay tunnel mirror selection). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestURLHandler.bind installed TestRelayDelegate as the LXMRouter delegate with originalDelegate: nil, displacing the production IncomingMessageHandler that _initializeServicesOnce had set. The router still persisted inbound messages to the DB, but ensureConversation and the messageReceivedNotification UI refresh never ran — so on debug builds (which always run bind) received messages fired notifications but never showed in the chats list. bind now takes the live IncomingMessageHandler and threads it through as the relay's wrapped delegate. Verified on-device: a fresh inbound-message stream now produces a conversation row with correct display name + unread count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes ship together to keep the Network Extension actually running once the user enables it, fixing the "tunnel goes .disconnected and never recovers" pattern observed on-device. 1. On-demand always-connect rules in the VPN profile (TunnelManager.install + start). iOS now keeps the tunnel up across wake/sleep, network changes, and restarts it after the system tears it down under memory pressure. Existing profiles are migrated on next start(); disable() clears the rules so stopVPNTunnel() doesn't silently bounce back on. 2. Status-observer restart loop in AppServices (scheduleTunnelRestartIfNeeded). When the tunnel transitions to .disconnected after having been .connected (and the user's tunnelEnabledKey is still true), schedule a restart with doubling backoff (1s start, 300s cap). This is the belt to on-demand's suspenders — iOS doesn't always re-fire on-demand promptly. Gated on tunnelHasBeenConnectedOnce so the initial boot .disconnected firing doesn't race the auto-start path. 3. Don't auto-clear tunnelEnabledKey on transient launch failures. The previous auto-start cleared the pref on a 30s no-connect timeout, permanently disabling background transport on any transient blip — the empirical "tunnel dead for 10h" state was reproducibly caused by this. The restart loop now handles transient failures; only the user's explicit toggle-off clears the pref. Verified on-device: cold launch from saved pref reaches .connected with both interfaces (foreground + NE-owned) present in rnsd's client list, backgrounding leaves the NE-owned connection intact (only the foreground socket dies), and foregrounding restores the dual state without any tunnel drop. Does NOT yet solve "notifications fire during backgrounding" — rnsd's path drift to the foreground socket still causes inbound packets to be dropped while the app is suspended. Path management is the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the iOS app moves to .background, fire a tunnel-only re-announce inside a UIApplication.beginBackgroundTask window so the last announce rnsd receives is via the NE-owned socket. rnsd's path table is single-path / last-write-wins (AppServices.swift:942), so this pins the path to the still-alive NE socket before iOS tears the foreground TCPInterface socket down — without this, the path stays on the foreground socket, goes dead the moment we suspend, and rnsd drops every inbound packet to our delivery destination. Verified on-device with a controlled 50s background window: the NE went from zero matches (path drift to dead foreground socket) to four `[EXT/NOTIF] match` + `UN add ok` entries in the same interval — two DIRECT LINKREQUEST matches for the phone's delivery dest plus two OPPORTUNISTIC matches for a second local destination. Phase B notifications now fire reliably while the app is suspended for the duration of the RNS path TTL. Sustained suspension (path TTL > foregrounded re-announce interval) still needs NE-side periodic re-announce — a separate change that requires the delivery identity in the App Group keychain. Adds public AppServices.announceViaTunnel() wrapping the existing private sendAnnounceViaTunnel — needed because the call site is the .background scenePhase handler in ColumbaApp, which lives in the App target and can't reach private AppServices methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Originally Phase 1.6–1.9 of the staged background-connectivity plan (the work
that opened this PR). Now also carries Phase B of the suspended-notification
work and a full architectural rewrite of how the foreground TCP and the NE
tunnel relate to each other.
Phase B — push destination-hash filter + notification scheduling into the NE
When iOS suspends the host app, Darwin notifications can't wake it (Apple DTS
forum 769398), so
NotificationService.postMessageNotificationnever firesuntil the user manually opens the app. The extension now peeks at the
unencrypted
destination_hashheader field of every deframed packet (HEADER_1offset 2, HEADER_2 offset 18, verified against
Reticulum/RNS/Packet.py: Packet.unpack), matches against the host app's set of registered LXMF/LXSTdestinations (published via App Group), and posts a
UNUserNotificationCenternotification under the host app's bundle identity when there's a match.
Crypto and full LXMF decode stay in the host app.
Sources/Shared/SharedFrameQueue.swift— addslocalDestinationsKeyand
localDestinationsChangedNotificationName. Hex-encoded[String]ofdestination hashes.
AppServices.publishLocalDestinations(hexHashes:)— fetchestransport.registeredDestinationHashes(), writes to App Group, posts theDarwin reload notification. Called at the end of both
initializeoverloads and after
initializeBaseStack.PacketTunnelProvider.maybeScheduleNotification(for:)— the filter +notification post. Fires on
packet_type=DATA && context=NONE(OPPORTUNISTIC LXMF arrivals) and
packet_type=LINKREQUEST(DIRECTdelivery initiation, the only DIRECT-flow packet addressed to our
delivery hash).
PacketTunnelProvider.reloadLocalDestinations+ Darwin observer forthe reload signal. Identity-switch + first-launch destination
registration update the filter without a tunnel restart.
Dual-interface tunnel architecture (replaces the old tunnel-mode flip)
Old: one
TCPInterfaceper relay; when Background Transport turned on,that interface gave up its app-process
NWConnectionmid-session androuted outbound through the NE instead. Fatal seam at the handoff — the
app-owned socket closed, rnsd removed the path-table entry attached to
it, and the new extension-owned socket had no announce yet because
TCPInterface.beginTunnelModekeepsstate=.connected(nonotifyStateChange, soauto_announce_on_tcp_reconnectnever fires).Bot → phone packets bounced off rnsd as
Got packet in transport, but no known path to final destination <phone-hash>for the entire suspendwindow.
New: two independent
NetworkInterfaces registered with the transportwhen Background Transport is on. The foreground
TCPInterfacekeepsowning its
NWConnectionin-process — it's never tunneled. A separateTunnelTCPInterfacecarries the tunnel path: outbound viaTunnelManager.sendFrame, inbound fromExtensionFrameReader's drainof
SharedFrameQueue. Both interfaces target the same rnsd at the samehost:port from the user's single TCP config; rnsd sees them as two
separate clients with two independent path-table entries.
Sources/ColumbaApp/Services/TunnelTCPInterface.swift(new) —implements
NetworkInterface. Outbound HDLC-frames data and callsTunnelManager.sendFrame(...)tagged withTUNNEL_TCP_INTERFACE_ID.AppServices.registerTunnelInterface()/deregisterTunnelInterface()—fire on tunnel status
.connected/.disconnected. Register mirrorsthe foreground TCP's host/port and publishes to a new
tunnelTCPEndpointsKeyso the extension opens its ownNWConnection.After registration sends
sendAllAnnounces(broadcast) followed by a100ms-delayed tunnel-only re-announce so rnsd's last-write-wins path
table pins to the tunnel socket. When the foreground socket dies on
suspend, the path remains valid via the tunnel.
PacketTunnelProvider.loadInterfaceConfigs— readstunnelTCPEndpointsKeyfirst. When present + non-empty, it's the onlysource of TCP entries; otherwise falls back to the legacy
interfacesKeyTCP parsing so older builds still work.ExtensionFrameReader.onTCPFrameReceivedrouting — frames taggedTUNNEL_TCP_INTERFACE_IDroute into the transport; other entity IDsdrop (foreground TCPInterfaces receive via their own
NWConnection).Real production bugs surfaced + fixed along the way
fix(init): don't block app init on UNUserNotificationCenter authorization prompt—the
await NotificationService.shared.requestPermission()call inRootView._initializeServicesOncehung the rest of init until the userresponded to the OS permission sheet. Fire-and-forget now.
fix(settings): register user-defaults at app launch, not in SettingsViewModel.loadLocalSettings—the
auto_announce_*defaults registration only ran when the user openedSettings. Fresh installs silently had
masterEnabled=false, so theon-reconnect announce never fired and rnsd lost the phone's path on every
TCP socket cycle. Lifted to
ColumbaApp.init().fix(tunnel): make applyTunnelModeToInterfaces(active: true) idempotent (task #96)—iOS reports
.connecting → .connected → .reasserting → .connectedduring VPN setup; the
.connectedcallback fired twice, callingbeginTunnelModeon each interface twice and racing in-flightLXMessage sends. Symmetric guard with the disable branch.
Test surface for the smoke harness
lxma-test://enable_tunnel+get_tunnel_status— bring the NEtunnel up + read its state from a URL. No persistence (each test run
starts from a clean tunnel-off state).
lxma-test://skip_onboarding— programmatically completesonboarding with an anonymous identity + TCP-client config. Self-contained
(doesn't route through TestController, so it works during the Onboarding
view's lifetime).
Validation
suspended_notificationsmoke scenario passes end-to-end:Suspended-notification clean: 2 notification(s) delivered during the 30s suspension window. Mac-side pinger sends DIRECT messages to the phone every10s; the tunnel TCP socket stays alive through the suspend window, rnsd
routes the bot's LINKREQUEST + DATA via the tunnel path, the extension's
filter matches against
localDestinationHashes, and Phase B fires UNnotifications.
Follow-ups filed
TunnelTCPInterfaceonly mirrors first foreground TCP entity (multi-relay)TunnelTCPInterfacedoesn't re-mirror on foreground TCP config editsTest plan
AutoAnnouncePolicyTestsupdated for theprocess-wide registration domain — see
bdca767)direct_echosmoke scenario passes (foreground DIRECT round-trip)suspended_notificationsmoke scenario passes — 2 UN notificationsfired during a 30s host-app suspend window from Mac-side DIRECT
pinger
bdca767)grant Local Network access + VPN profile, lock the phone, have a
peer send a DIRECT message — confirm the banner appears on the
lock screen
path is dormant (no spurious notifications, app receives via foreground
TCP)
🤖 Generated with Claude Code