Skip to content

feat(ne): enable Background Transport + Phase B suspended-notifications + dual-interface tunnel#57

Open
torlando-tech wants to merge 40 commits into
mainfrom
feat/enable-tunnel-flip-flag
Open

feat(ne): enable Background Transport + Phase B suspended-notifications + dual-interface tunnel#57
torlando-tech wants to merge 40 commits into
mainfrom
feat/enable-tunnel-flip-flag

Conversation

@torlando-tech
Copy link
Copy Markdown
Owner

@torlando-tech torlando-tech commented Apr 29, 2026

Summary

Originally Phase 1.6–1.9 of the staged background-connectivity plan (the work
that opened this PR). Now also carries Phase B of the suspended-notification
work and a full architectural rewrite of how the foreground TCP and the NE
tunnel relate to each other.

Phase B — push destination-hash filter + notification scheduling into the NE

When iOS suspends the host app, Darwin notifications can't wake it (Apple DTS
forum 769398), so NotificationService.postMessageNotification never fires
until the user manually opens the app. The extension now peeks at the
unencrypted destination_hash header field of every deframed packet (HEADER_1
offset 2, HEADER_2 offset 18, verified against Reticulum/RNS/Packet.py: Packet.unpack), matches against the host app's set of registered LXMF/LXST
destinations (published via App Group), and posts a UNUserNotificationCenter
notification under the host app's bundle identity when there's a match.
Crypto and full LXMF decode stay in the host app.

  • Sources/Shared/SharedFrameQueue.swift — adds localDestinationsKey
    and localDestinationsChangedNotificationName. Hex-encoded [String] of
    destination hashes.
  • AppServices.publishLocalDestinations(hexHashes:) — fetches
    transport.registeredDestinationHashes(), writes to App Group, posts the
    Darwin reload notification. Called at the end of both initialize
    overloads and after initializeBaseStack.
  • PacketTunnelProvider.maybeScheduleNotification(for:) — the filter +
    notification post. Fires on packet_type=DATA && context=NONE
    (OPPORTUNISTIC LXMF arrivals) and packet_type=LINKREQUEST (DIRECT
    delivery initiation, the only DIRECT-flow packet addressed to our
    delivery hash).
  • PacketTunnelProvider.reloadLocalDestinations + Darwin observer for
    the reload signal. Identity-switch + first-launch destination
    registration update the filter without a tunnel restart.

Dual-interface tunnel architecture (replaces the old tunnel-mode flip)

Old: one TCPInterface per relay; when Background Transport turned on,
that interface gave up its app-process NWConnection mid-session and
routed outbound through the NE instead. Fatal seam at the handoff — the
app-owned socket closed, rnsd removed the path-table entry attached to
it, and the new extension-owned socket had no announce yet because
TCPInterface.beginTunnelMode keeps state=.connected (no
notifyStateChange, so auto_announce_on_tcp_reconnect never fires).
Bot → phone packets bounced off rnsd as Got packet in transport, but no known path to final destination <phone-hash> for the entire suspend
window.

New: two independent NetworkInterfaces registered with the transport
when Background Transport is on. The foreground TCPInterface keeps
owning its NWConnection in-process — it's never tunneled. A separate
TunnelTCPInterface carries the tunnel path: outbound via
TunnelManager.sendFrame, inbound from ExtensionFrameReader's drain
of SharedFrameQueue. Both interfaces target the same rnsd at the same
host:port from the user's single TCP config; rnsd sees them as two
separate clients with two independent path-table entries.

  • Sources/ColumbaApp/Services/TunnelTCPInterface.swift (new)
    implements NetworkInterface. Outbound HDLC-frames data and calls
    TunnelManager.sendFrame(...) tagged with TUNNEL_TCP_INTERFACE_ID.
  • AppServices.registerTunnelInterface() / deregisterTunnelInterface()
    fire on tunnel status .connected / .disconnected. Register mirrors
    the foreground TCP's host/port and publishes to a new
    tunnelTCPEndpointsKey so the extension opens its own NWConnection.
    After registration sends sendAllAnnounces (broadcast) followed by a
    100ms-delayed tunnel-only re-announce so rnsd's last-write-wins path
    table pins to the tunnel socket. When the foreground socket dies on
    suspend, the path remains valid via the tunnel.
  • PacketTunnelProvider.loadInterfaceConfigs — reads
    tunnelTCPEndpointsKey first. When present + non-empty, it's the only
    source of TCP entries; otherwise falls back to the legacy
    interfacesKey TCP parsing so older builds still work.
  • ExtensionFrameReader.onTCPFrameReceived routing — frames tagged
    TUNNEL_TCP_INTERFACE_ID route into the transport; other entity IDs
    drop (foreground TCPInterfaces receive via their own NWConnection).

Real production bugs surfaced + fixed along the way

  • fix(init): don't block app init on UNUserNotificationCenter authorization prompt
    the await NotificationService.shared.requestPermission() call in
    RootView._initializeServicesOnce hung the rest of init until the user
    responded to the OS permission sheet. Fire-and-forget now.
  • fix(settings): register user-defaults at app launch, not in SettingsViewModel.loadLocalSettings
    the auto_announce_* defaults registration only ran when the user opened
    Settings. Fresh installs silently had masterEnabled=false, so the
    on-reconnect announce never fired and rnsd lost the phone's path on every
    TCP socket cycle. Lifted to ColumbaApp.init().
  • fix(tunnel): make applyTunnelModeToInterfaces(active: true) idempotent (task #96)
    iOS reports .connecting → .connected → .reasserting → .connected
    during VPN setup; the .connected callback fired twice, calling
    beginTunnelMode on each interface twice and racing in-flight
    LXMessage sends. Symmetric guard with the disable branch.

Test surface for the smoke harness

  • lxma-test://enable_tunnel + get_tunnel_status — bring the NE
    tunnel up + read its state from a URL. No persistence (each test run
    starts from a clean tunnel-off state).
  • lxma-test://skip_onboarding — programmatically completes
    onboarding with an anonymous identity + TCP-client config. Self-contained
    (doesn't route through TestController, so it works during the Onboarding
    view's lifetime).

Validation

suspended_notification smoke scenario passes end-to-end:
Suspended-notification clean: 2 notification(s) delivered during the 30s suspension window. Mac-side pinger sends DIRECT messages to the phone every
10s; the tunnel TCP socket stays alive through the suspend window, rnsd
routes the bot's LINKREQUEST + DATA via the tunnel path, the extension's
filter matches against localDestinationHashes, and Phase B fires UN
notifications.

Follow-ups filed

Test plan

  • Unit tests pass locally (AutoAnnouncePolicyTests updated for the
    process-wide registration domain — see bdca767)
  • Build green on iPhone (debug)
  • direct_echo smoke scenario passes (foreground DIRECT round-trip)
  • suspended_notification smoke scenario passes — 2 UN notifications
    fired during a 30s host-app suspend window from Mac-side DIRECT
    pinger
  • CI test suite (re-running after the test fix in bdca767)
  • Manual: open Columba on a fresh install, complete onboarding,
    grant Local Network access + VPN profile, lock the phone, have a
    peer send a DIRECT message — confirm the banner appears on the
    lock screen
  • Manual: turn Background Transport off — confirm Phase B's notification
    path is dormant (no spurious notifications, app receives via foreground
    TCP)

🤖 Generated with Claude Code

Phase 1.6-1.9 of the staged plan plus the onboarding step:

- TunnelManager.disable() now sets isEnabled=false and saveToPreferences()
  after stopVPNTunnel(); calling stopVPNTunnel() alone leaves the profile
  partially-active in iOS routing, which was the root cause of the
  "toggle off but TCP stays broken" report.
- AppServices auto-restarts the tunnel from the App Group's
  tunnel_enabled preference at initialize() time so users don't have
  to re-toggle on every launch.
- SettingsView toggle uses do/catch with DiagLog and an inline error
  label so install / start failures (entitlement issues, declined
  VPN-profile prompts) are visible instead of silently bouncing the
  toggle off. Toggle persists tunnel_enabled on success.
- New onboarding step (page 4 of 6) "Stay Connected in the Background"
  with a pre-checked toggle. completeOnboarding() writes the value to
  the App Group so AppServices can auto-start on first launch and
  trigger the VPN-profile prompt at the right moment.
- ENABLE_NETWORK_EXTENSION compilation flag is now set on ColumbaApp's
  Debug + Release configs alongside CODE_SIGN_ENTITLEMENTS pointing at
  ColumbaApp.entitlements. The app target depends on the extension
  target and embeds it via a PBXCopyFilesBuildPhase (Foundation
  Extensions, dstSubfolderSpec=13).

Verified with xcodebuild — Debug iphonesimulator build succeeds and
copies ColumbaNetworkExtension.appex into ColumbaApp.app/PlugIns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR introduces three tightly coupled features: the dual-interface tunnel architecture (separate TunnelTCPInterface alongside the foreground TCPInterface, replacing the fragile mid-session tunnel-mode flip), Phase B suspended notifications (extension peeks the unencrypted destination_hash header and posts a UNUserNotificationCenter banner under the host app's identity), and a set of targeted bug fixes (user-defaults registration at launch, idempotent applyTunnelModeToInterfaces, fire-and-forget notification permission request).

  • Dual-interface tunnel: registerTunnelInterface() fires on .connected, publishes the endpoint to App Group, and sends a broadcast + tunnel-only re-announce so rnsd's last-write-wins path table pins to the extension's socket before the app suspends. deregisterTunnelInterface() is debounced 5 s to survive debug reloads.
  • Phase B notifications: maybeScheduleNotification(for:) inspects HEADER_1/HEADER_2 destination-hash offsets, fires for DATA+NONE (OPPORTUNISTIC) and LINKREQUEST (DIRECT), coalesces LINKREQUEST retries via a static ext-linkreq-<hash> identifier, and NotificationService.removeExtensionPlaceholders() deduplicates the generic banner when the host app's richer notification fires for the same message.
  • Bug fixes: SettingsViewModel.registerLocalDefaults() moved to ColumbaApp.init(), notification-permission request made fire-and-forget, and the .connected double-fire guard added to applyTunnelModeToInterfaces.

Confidence Score: 3/5

Merging carries risk — several open issues from prior review rounds directly affect core paths that this PR extends.

The new dual-interface architecture and Phase B notification logic are well-structured and the key edge cases are explicitly handled. However, shutdown() leaves tunnelTCPInterface registered against the dead transport (identity switches silently drop all tunnel-path inbound frames), requiredInterfaceType = .other blocks the extension TCP connection on cellular-only devices, and the object: nil VPN-status observer fires on any system VPN change.

Sources/ColumbaApp/Services/AppServices.swift (shutdown gap), Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift (requiredInterfaceType), Sources/ColumbaApp/Services/TunnelManager.swift (third-party VPN observer scope)

Important Files Changed

Filename Overview
Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Major new code: TCP frame handling, HDLC deframing, destination-hash filter, UNNotification posting, dual-interface TCP config loading, and diagnostic listener. Several pre-existing issues remain open (requiredInterfaceType, UInt16 trapping, autoBridge data race).
Sources/ColumbaApp/Services/AppServices.swift Large additions: TunnelTCPInterface registration/deregistration, publishLocalDestinations, sendAnnounceViaTunnel, applyTunnelModeToInterfaces (now dead code). shutdown() still does not deregister tunnelTCPInterface. Hardcoded display name in announce calls.
Sources/ColumbaApp/Services/TunnelTCPInterface.swift New file implementing NetworkInterface via a send-hook for outbound and receivePacket() for inbound. Clean design; connect/disconnect are idempotent. No issues found.
Sources/ColumbaApp/Services/NotificationService.swift Added removeExtensionPlaceholders() to deduplicate the extension's generic banner against the host app's rich per-conversation banner. Correctly handles both pending and delivered lists.
Sources/Shared/SharedFrameQueue.swift New shared constants for local destinations and tunnel TCP endpoints. Pre-existing issues (unguarded write on lock failure, break-on-malformed-frame) flagged in previous reviews remain open.
Sources/ColumbaApp/App/ColumbaApp.swift Moved SettingsViewModel.registerLocalDefaults() to app init; added background-phase tunnel re-announce; new test URL scheme dispatch. All changes look correct.
Sources/ColumbaApp/Services/TunnelManager.swift isEnabled updated before saveToPreferences in start() re-enable path (fixes prior finding). Spurious third-party VPN notification issue (object: nil observer) remains from previous review.
Sources/ColumbaNetworkExtension/ExtensionAutoBridge.swift No significant changes in this PR; pre-existing data-race issues on autoInterface and diagnostic ACK spam flagged in prior reviews remain open.

Sequence Diagram

sequenceDiagram
    participant App as ColumbaApp (Foreground)
    participant TunnelMgr as TunnelManager
    participant EXT as PacketTunnelProvider (NE)
    participant RNSD as rnsd (relay)

    App->>TunnelMgr: start()
    TunnelMgr-->>App: onStatusChange(.connected)
    App->>App: registerTunnelInterface()
    App->>RNSD: sendAllAnnounces [foreground TCP + tunnel]
    Note over App,RNSD: 100ms delay
    App->>RNSD: sendAnnounceViaTunnel [tunnel only]
    Note over RNSD: path-table pins to tunnel socket

    Note over App: App suspends
    App--xRNSD: foreground TCP socket dies
    RNSD->>EXT: inbound DATA or LINKREQUEST
    EXT->>EXT: maybeScheduleNotification()
    EXT->>EXT: UNUserNotificationCenter.add(ext-linkreq-hash)
    EXT->>App: SharedFrameQueue.append + Darwin packetReady

    Note over App: App foregrounds
    App->>App: ExtensionFrameReader drains queue
    App->>App: transport.handleReceivedData(TUNNEL_TCP_INTERFACE_ID)
    App->>App: removeExtensionPlaceholders + postMessageNotification
Loading

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
Sources/ColumbaApp/Services/AppServices.swift:194-201
**Dead code retained with misleading justification**

The comment says `isTunnelModeActive` is kept so `applyTunnelModeToInterfaces`'s idempotency guards "still compile cleanly while the function itself is no-longer called." But since the function is never invoked, its guards have no effect at all. The dead function and its state variable currently confuse the picture of which paths are live: future contributors reading the `onStatusChange` handler will see the `isTunnelModeActive` field alongside `tunnelTCPInterface` and may not realise the tunnel-mode-flip architecture is fully replaced. The entire `applyTunnelModeToInterfaces` function and `isTunnelModeActive` variable can be deleted now that the dual-interface refactor is complete.

Reviews (34): Last reviewed commit: "fix(tunnel): re-announce on background t..." | Re-trigger Greptile

Comment thread Sources/ColumbaApp/Services/TunnelManager.swift
Comment thread Sources/ColumbaApp/ViewModels/OnboardingViewModel.swift
P1: TunnelManager.start() — set self.isEnabled = true up-front so the
re-enable path after disable() doesn't leave the observable stale.

P1: SettingsView toggle — add tunnelPending @State that overrides the
binding's get during VPN start/disable transitions, with a 30s settle
loop that waits for tunnel.isRunning to match the user's intent before
clearing the override. Without this, .connecting / .disconnecting
re-renders snap the toggle back across the user-facing transition.

P2: TunnelManager.disable() — move isEnabled = false before any throwing
call so a thrown saveToPreferences leaves observers seeing the user's
intent rather than the stale pre-call value.

P2: OnboardingViewModel — gate the tunnel_enabled write with
ENABLE_NETWORK_EXTENSION so non-extension builds don't write a stale
true that nothing reads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaApp/Views/Settings/SettingsView.swift Outdated
P1: Cancel the in-flight Background Transport Task before spawning a
new one so a rapid ON→OFF tap can't race the previous start()'s
install() flow. Without this, an older Task_ON would silently
finish install() and call startVPNTunnel() after the user's last
intent was OFF — leaving the toggle visually on a state opposite
the actual VPN. Adds a checkCancellation() in TunnelManager.start()
right before startVPNTunnel() so a cancelled caller can't fire iOS's
VPN bring-up after the await.

Cancellation is treated as supersession (silent return) rather than
an error — the new Task already owns the next state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaApp/Views/Settings/SettingsView.swift
torlando-tech and others added 4 commits April 29, 2026 11:55
P1: Surface async tunnel-connection failures. After startVPNTunnel()
returns successfully but iOS later fails to bring the VPN up
(airplane mode, routing failure, extension crash), the toggle's
30s settle loop times out without setting tunnelErrorMessage —
exactly the silent-bounce the PR description claims to replace.
After the loop, if newValue==true but tunnel.isRunning==false,
fetch the disconnect reason via NEVPNConnection.fetchLastDisconnectError
and show it inline.

P2: Gate the Background Transport onboarding step on
ENABLE_NETWORK_EXTENSION. pageCount is 6 with the flag and 5
without; the page-3 case in OnboardingView is wrapped in an
#if/#else, with extracted `permissionsPageView()` /
`completePageView()` helpers so both branches stay readable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: Status row now mirrors tunnelPending ?? tunnel.isRunning so the
indicator dot and label match the toggle's visual state during
.connecting / .disconnecting. Adds "Starting…" / "Stopping…"
during the transitional window, replacing the previous "Stopped"
label that contradicted the ON-position toggle.

P2: Persist tunnel_enabled to the App Group only after the actual
VPN status matches the user's intent. Writing it before the status
is confirmed would auto-restart the same failing tunnel on every
relaunch when start() succeeds at launch but iOS later rejects the
connection (airplane mode, routing failure, extension crash).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: Annotate the Background Transport toggle's Task as @mainactor so
the @State mutations (tunnelPending, tunnelErrorMessage) are
guaranteed to run on the main actor instead of relying on the
inherited-but-undefined SwiftUI Task isolation.

P2: AppServices auto-start now clears the tunnel_enabled pref on
failure so persistent issues (revoked profile, missing entitlement,
OS-level VPN restriction) don't silently retry every launch. The
user re-enables from Settings, where the toggle's error label can
show the actual failure reason instead of dying silently in DiagLog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P2: When `disable()` throws after the synchronous `stopVPNTunnel()` ran
(e.g. an unusual OS-level `saveToPreferences()` failure), persist the
user's OFF intent to the App Group anyway so a relaunch doesn't
auto-restart the tunnel against their wishes. Start errors still leave
the pref alone — committing to a failing start would loop the same
failure on every launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaApp/Services/AppServices.swift
torlando-tech and others added 5 commits April 29, 2026 12:26
P1: AppServices auto-start now polls for `tunnel.isRunning` after
calling `tunnel.start()` (mirroring the Settings toggle's settle
window) so async failures — airplane mode, routing failure,
extension crash — clear the pref instead of looping silently on
every cold-launch. Wraps the auto-start in a detached Task so the
30-second wait doesn't block the rest of `initialize()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets an existing user walk through the onboarding pages again
without losing chats / identities — useful when verifying a
newly-added onboarding step (e.g. Background Transport). The
OnboardingView is presented with `isRestart = true` and the
view-model's `completeOnboarding()` skips identity / interface /
display-name creation in that mode, only committing the values
that the new steps drive.

Gated behind `#if DEBUG` so it doesn't ship to production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported regression: enabling Background Transport killed AutoInterface
peer discovery — no announces went out, no peers spawned, even in
foreground.

Root cause: the extension's `NWConnectionGroup` is hard-coded to
`ff02::1` on a single port, but reticulum-swift's AutoInterface derives
its multicast group per groupId (`ff12:0:...` from `multicastAddress(for:)`)
and runs per-peer unicast on a separate data port (42671). Putting Auto
into tunnel mode tore down the local NWConnectionGroup and replaced it
with a non-functional one in the extension — hence no peer discovery
and no traffic.

Fix: skip Auto in `applyTunnelModeToInterfaces`. AutoInterface is
intrinsically local-Wi-Fi only — iOS suspending multicast in the
background is an OS-level limit, not something the tunnel can paper
over. TCP keeps delivering messages while backgrounded, which is the
whole point of Phase 1. A future change can reimplement the Auto
protocol (groupId-derived multicast + per-peer unicast) inside the
extension if we want background Auto too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt used a hard-coded `ff02::1` `NWConnectionGroup` and a
single port — but reticulum-swift's AutoInterface derives its multicast
address from the group id (`ff12:0:…` via `multicastAddress(for:)`) and
runs per-peer unicast on the data port (42671). Tunneling Auto through
that broken listener killed peer discovery and silently dropped data.

This change links `ReticulumSwift` into the extension target and runs an
actual `AutoInterface` instance inside the Network Extension via a new
`ExtensionAutoBridge`:

- `ExtensionAutoBridge` instantiates `AutoInterface` with the configured
  group id, sets a delegate that funnels every received packet (parent
  AutoInterface + every spawned `AutoInterfacePeer` sub-interface) into
  `SharedFrameQueue` with the Auto tag, and exposes a `send(_:)` that
  hands outbound bytes off to `AutoInterface.send(_:)` for the regular
  per-peer fan-out.
- `PacketTunnelProvider` now drives the bridge from `applyConfigsLocked`
  (start / stop on group-id diff) and routes app outbound (the `auto`
  tag in `handleAppMessage`) through `autoBridge.send(_:)`.
- `applyTunnelModeToInterfaces` puts the app's AutoInterface back into
  tunnel mode when the VPN is up — this reverts the temporary
  "Auto stays local" stop-gap.

Net effect: once the tunnel is connected, Auto peer discovery and data
delivery happen entirely inside the extension, so they keep working
when the app is backgrounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an `ExtensionDiagLog` writing to `ext_diag.log` in the App Group
container so both the extension and the app can append diagnostic
lines (Network Extensions don't have a clean equivalent of `DiagLog`'s
file-backed log). `AppServices.initialize()` snapshots the file into
the app's `Documents/ext_diag.log` on every launch so it's pullable
via `xcrun devicectl device copy from`.

Hooks logging into ExtensionAutoBridge (start / stop / peer add /
peer remove / RX bytes / TX bytes / TX failures / TX dropped because
autoInterface is nil) and a couple of breadcrumbs in
`PacketTunnelProvider` (`startTunnel` / Auto config (re)applying).
Lets us see whether the extension's AutoInterface is actually firing
on real devices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/ExtensionAutoBridge.swift
torlando-tech and others added 3 commits April 29, 2026 16:02
Two attempts to put AutoInterface into the extension hit the same
NEPacketTunnelProvider sandbox limitation:

1. reticulum-swift's `AutoInterface` (POSIX sockets bound to
   link-local IPv6 + per-peer `sendto`) — bind on the data port
   succeeds but iOS routes inbound unicast UDP to the system
   networking stack, not the extension's socket. Multicast
   loopback works, real LAN packets never arrive.

2. From-scratch implementation on Apple's Network framework
   (`NWMulticastGroup` for HELLO discovery + `NWListener` for
   inbound unicast data + per-peer `NWConnection` for outbound) —
   same outcome. `NWListener.newConnectionHandler` never fires
   even with no `requiredInterfaceType`. Confirms the limitation
   isn't the API choice; it's the extension sandbox.

Reverts both bridge implementations and the `onWillStart` /
"release UDP sockets before extension launch" plumbing. Phase 1
ships TCP-only background, which is the win that actually solves
issue #54 (messages-while-locked over TCP). AutoInterface keeps
working locally for foreground use — same behaviour the user had
before this PR.

Background AutoInterface needs a different architecture (e.g.
configuring the tunnel's `includedRoutes` to capture the multicast
group + dataPort and reading them via `packetFlow`) and is left
for a future PR.

Keeps the `ExtensionDiagLog` plumbing in place for future
debugging and the diff logic in `applyConfigsLocked` so the
re-enable path is short.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tools/auto-test/` runs the full loop without manual UI taps:

- `send_test_traffic.py` mirrors reticulum-swift's
  `AutoInterfaceConstants` (`ff12:0:…` group derivation, SHA-256
  discovery token, 29716/42671 ports) so a Mac on the same Wi-Fi
  can stand in for a Sideband peer — sends multicast HELLOs +
  one unicast announce-shaped UDP packet.
- `run_test.sh` builds, installs, relaunches, sends test traffic,
  pulls `ext_diag.log` + `diag.log` via `xcrun devicectl device
  copy from`, and greps for expected entries. Exit code 0 when the
  expected entries are present.

The current revision asserts the basic "tunnel reached enabled
state" path because auto-in-extension is reverted in this PR.
Verifier comments mark where to re-enable the
`NWListener accepted inbound` assertion when we revisit
background AutoInterface with a different architecture.

Known gap: iOS keeps the running extension instance across app
re-deploys, so the harness still needs the user to delete and
re-add the VPN profile in iOS Settings once per build to load
the new extension binary. Plan is to bake a `/debug/reload-
extension` `handleAppMessage` command into the extension that
calls `cancelTunnelWithError` so the harness can force-reload
without any UI taps — see TODO in `run_test.sh`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After on-device testing surfaced three regressions:

1. Re-run Onboarding orphaned existing chats. CompletePage's onAppear
   raced the outer view's isRestart propagation, so prepareIdentity
   created a fresh identity and switched to it before isRestart
   reached the view model. Move isRestart into the view model's init
   so prepareIdentity / completeOnboarding both see the correct flag
   from the first call.

2. switchIdentity created a duplicate "tcp-server" TCPInterface. The
   legacy initialize(identity:identityHash:tcpServerAddress:) path
   parsed the supplied address and created a TCPInterface with id
   "tcp-server" alongside the InterfaceRepository-owned UUID entity
   that Step 7 connects on identity-switch. Both ended up in tunnel
   mode, splitting outbound. Drop the parameter and the synthesized
   interface — Step 7's repository iteration is now the only source
   of TCP interfaces.

3. AutoInterface in the extension is non-functional. Empirical
   testing on device confirmed iOS sandboxes UDP outbound from
   NEPacketTunnelProvider for both Network framework primitives
   (NWConnection / NWConnectionGroup silently drops) and POSIX
   sendto (ENETUNREACH). Inbound works (NWListener accepts unicast,
   POSIX IPV6_JOIN_GROUP receives multicast) but the sandbox blocks
   the reply path, so Auto cannot peer from the extension at all.
   Revert applyTunnelModeToInterfaces to TCP-only and update
   onboarding + Settings copy to make Auto's foreground-only
   behaviour explicit. BLE / RNode keep their own background-mode
   plumbing (Phase 2) and aren't covered by the same caveat.

Plus race fix in connectTCPInterface: when the tunnel auto-starts
during cold launch and reaches .connected before Step 7 has populated
tcpInterfaces, the late-added interface stays on its local
NWConnection. Apply tunnel mode at the late-add site too.
- ExtensionDiagLog: add 1 MiB tail-keep rotation so the always-on
  extension's log can't exhaust the App Group container (which would
  silently break SharedFrameQueue.append). Drops the oldest ~half
  on cap-exceed, aligned to a newline so we don't truncate mid-line.

- run_test.sh: derive DERIVED from xcodebuild's BUILD_DIR rather
  than hardcoding the DerivedData hash (which Xcode regenerates on
  rename / fresh checkout). DEVELOPMENT_TEAM is now an env override
  with the same default; DEVICE_UDID was already overridable.

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Outdated
Comment thread Sources/ColumbaNetworkExtension/ExtensionAutoBridge.swift
torlando-agent Bot and others added 5 commits May 8, 2026 02:36
Gate two extension-only diagnostic probes behind `#if DEBUG`:

- PacketTunnelProvider.startTunnel(): startDiagListener() and
  sendDiagOutboundProbe() were called unconditionally. The outbound
  probe targets a hard-coded developer link-local IPv6
  (fe80::c2d:e309:eb09:6343) on every user's device on every tunnel
  start, leaking the dev's address; the listener also bound port 9999
  in production. Both belong only in builds the test harness drives.

- ExtensionAutoBridge.receiveLoop(): the synthetic "ext-rx-ack-…"
  echo back to every Auto peer is a one-shot probe to test whether
  iOS allows replies on accepted UDP flows; in production it floods
  every peer with non-protocol ASCII payloads and muddies on-wire
  debugging. Same #if DEBUG gate.

Verified with `xcodebuild -configuration Debug` and
`xcodebuild -configuration Release` (both succeed; no new warnings
on the changed files in Release).

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Drop personally-identifying defaults from `tools/auto-test/run_test.sh`:

- DEVICE_UDID and DEVELOPMENT_TEAM no longer have hardcoded defaults
  baked into the script. Both are unique identifiers (a specific
  physical device's UDID; an Apple Developer Team ID) and shouldn't
  live in source control even with the override path. Greptile flagged
  the security risk of the UDID staying in HEAD.

- The script now errors out early with a clear message if either is
  unset (DEVELOPMENT_TEAM is only required when not using --skip-build).

- Top-of-file prereqs block updated to document the env-var contract.

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Gate `DiagLog.snapshotExtensionLog()` behind `#if DEBUG`. The call
mirrors the extension's diag log into `Documents/ext_diag.log` on
every cold launch — useful for `xcrun devicectl device copy from`
during development, but in production it surfaces connection
diagnostics into the user's File-Sharing-visible Documents folder
on every app start. Greptile flagged this as the last 4-to-5
ceiling-keeper.

Verified Debug + Release builds.

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
* fix: hot-swap TCP interfaces without disturbing the others

Toggling/editing any TCP interface in Interfaces settings was tearing
down every other healthy TCP connection alongside the one the user
actually changed. Each reconnect triggered the relay to redeliver its
full announce table, swamping the app for ~90s per change (90k+
announces in one minute, observed on rmap.world).

Two layers of fix:

1. `AppServices.connectTCPInterface(entityId:host:port:)` is now
   idempotent. It tracks the last-applied host:port per entity and
   returns immediately when called with the same endpoint as the
   currently-running interface. Calling it with a different endpoint
   still disconnects-and-recreates as before.

2. `InterfaceManagementViewModel.applyChanges` loops over every
   enabled TCP entity (not just the one that changed). It now skips
   entities whose endpoint hasn't moved, avoiding both the connect
   call AND the brief `.connecting` UI flicker.

Stop and shutdown paths clear the endpoint dictionary alongside
`tcpInterfaces` so a future re-add doesn't short-circuit against a
stale entry.

Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on
existence checks and don't trigger this. Config changes for those
types still don't take effect without a manual disable/re-enable —
separate issue, smaller blast radius, not addressed here.

* fix: hot-swap TCP interfaces without disturbing the others

Toggling/editing any TCP interface in Interfaces settings was tearing
down every other healthy TCP connection alongside the one the user
actually changed. Each reconnect triggered the relay to redeliver its
full announce table, swamping the app for ~90s per change (90k+
announces in one minute, observed on rmap.world).

Two layers of fix:

1. `AppServices.connectTCPInterface(entityId:host:port:)` is now
   idempotent. It tracks the last-applied host:port per entity and
   returns immediately when called with the same endpoint as the
   currently-running interface. Calling it with a different endpoint
   still disconnects-and-recreates as before.

2. `InterfaceManagementViewModel.applyChanges` loops over every
   enabled TCP entity (not just the one that changed). It now skips
   entities whose endpoint hasn't moved, avoiding both the connect
   call AND the brief `.connecting` UI flicker.

Stop and shutdown paths clear the endpoint dictionary alongside
`tcpInterfaces` so a future re-add doesn't short-circuit against a
stale entry.

Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on
existence checks and don't trigger this. Config changes for those
types still don't take effect without a manual disable/re-enable —
separate issue, smaller blast radius, not addressed here.

* feat: multi-TCP tunnel — extension manages a connection per entity

Previously the Network Extension kept a single `tcpConnection` and a
single `currentTCP` endpoint, so enabling two TCP relays in the app
silently dropped one — the extension's config loader overwrote
`result.tcp` on every iteration and only the last enabled tcpClient
in the JSON array got a socket. The other relay was unreachable
through the tunnel and inbound from the wrong relay was routed back
to whichever `TCPInterface` happened to be first in the app's
dictionary.

This commit lifts the entire tunnel TCP layer to per-entity:

- `SharedFrameQueue` frame format gains a 1-byte entityId-length
  field and a length-prefixed UTF-8 entity id between the interface
  tag and the frame payload. Old format frames in flight at the
  upgrade are lost on first read; the queue is append-and-clear
  so the lifetime is short.
- `TunnelManager.sendFrame` adds an `entityId` parameter and writes
  it into the IPC envelope sent via `sendProviderMessage`.
  `connectTCPInterface` and `applyTunnelModeToInterfaces` now
  capture the entity id in the per-interface tunnel-mode hook so
  outbound frames from each `TCPInterface` carry their own id.
- `ExtensionFrameReader.onTCPFrameReceived` is now `(entityId, data)`
  and the AppServices handler routes inbound frames to the matching
  `TCPInterface` by id, with safe fallbacks for empty/legacy ids.
- `PacketTunnelProvider` replaces `tcpConnection` /
  `tcpReceiveBuffer` / `currentTCP` with per-entity dicts. Each
  `NWConnection` has its own HDLC receive buffer (sharing one
  buffer between two streams would corrupt frame boundaries),
  its own state-update handler that only tears down its own entry,
  and its own `receiveTCPData` recursion so inbound frames are
  tagged with the right id when appended to the queue.
- `applyConfigsLocked` diffs per-entity: an entry whose endpoint is
  unchanged keeps its connection, a removed entry tears down only
  its own socket, an edited entry restarts only that socket. Adding
  a second relay no longer disturbs the first.
- `loadInterfaceConfigs` returns `tcps: [String: (host, port)]`
  keyed by `InterfaceEntity.id` instead of a single optional.

`handleAppMessage` parses the new wire format (entityId-length +
entityId in front of frame data) and looks up the connection by id,
falling back to the sole connection when the id is empty so a
hypothetical legacy single-TCP build still routes correctly.

* chore: extension diag logs for TCP config/state changes

Lifecycle events only — config (re)apply, config removal, state
transitions, failure. Per-frame and per-drain logging is omitted
to keep the file small. Per-entity tagging in the messages makes
multi-TCP behaviour observable without needing syslog access.

Used to diagnose the silent-inbound regression that turned out to
be the SharedFrameQueue wire-format roll-out interacting with a
not-yet-relaunched extension; left in place for future debugging.

* feat(InterfaceManagement): add TCP client community-server wizard

Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding
add-interface surface: server selection (bootstrap/community/custom) →
review & configure. Routes Settings → Network Interfaces → + → TCP Client
through the wizard instead of the blank manual entry sheet, and reroutes
edit-existing for TCP entries to the same flow with pre-filled values.

Scoped to the fields TCPClientConfig already supports (host, port,
networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred.

Closes #51

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(MicronParser): persist formatting state across lines (#63)

* fix(MicronParser): persist formatting state across lines

The line-by-line parse loop hardcoded `currentStyle: .plain` on every
parseInline call, so a `Fxxx`Bxxx preamble line consumed its colors
into an empty span and the following ASCII art rendered with no fg/bg.
Match python NomadNet's MicronParser by promoting currentStyle to a
parser-loop local that threads through every parseInline call, with
parseInline returning the terminal style so the caller can carry it
forward. `< at line-start additionally resets currentStyle to .plain,
matching python's `<` semantics.

Repro: the index.mu at github.com/fr33n0w/thechatroom uses the
preamble shape `F0ff`B52f then ASCII art then `f`b — before this fix
the colors were silently dropped.

Closes #31

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(NodeDetailsView): allow tapping action buttons on stale-path contacts

Browse Site / Start Chat / Set as My Relay were `.disabled(!isOnline)`
on a contact's NodeDetailsView, where `isOnline` is just `Date() <
entry.expires` from the path table. After cleanupLinks runs `expirePath`
on a failed-link destination, the contact's path becomes "expired" until
a new announce arrives — but Reticulum's path discovery is exactly
designed for that case (issue a path-request, any peer with a recent
announce will respond). Greying the button blocks the user from the very
operation that would heal the path.

Drops the `.disabled` and `.opacity` modifiers from `actionButton(...)`
and the relay-toggle button. The underlying flow
(`NomadNetBrowserService.resolveValidPath`) already does
`pathTable.remove` + `transport.requestPath` + 10s poll, so taps now
flow through to the working recovery path.

Also reword the expired-hint copy from "Ask them to send an announce
from their app, or wait for one to arrive automatically" to "Tap an
action to issue a path request — any node on the network with a recent
announce will respond." — the original copy is wrong about how
Reticulum path discovery works and discourages users from doing the
right thing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(MicronDocumentView): render the chat-room ASCII art correctly

Three bugs surfaced once the parser carried `Bxxx background colors
forward across lines (faf17e4):

1. Centering broke against the document, not the screen. A wide row
   (e.g. fr33n0w/thechatroom's 550-char trailing-whitespace line)
   pushed the VStack out to ~4600pt; centered shorter rows landed
   at the middle of *that* width — way past the viewport. Fixed by
   capturing the actual screen viewport via GeometryReader in
   MonospaceScrollContainer (mirrors Android's
   `Modifier.widthIn(min = viewportLineWidth)` from
   NomadNetBrowserScreen.kt:474) and wrapping each scroll-mode row
   in `.frame(minWidth: viewportWidth, alignment: alignment.swiftUI)`.

2. Row-to-row column alignment drifted by half a cell because
   Core Text's `textAlignment = .center` strips trailing whitespace
   when computing the centered offset. Lines with a trailing space
   centered as if one cell narrower than lines without — visible as
   the letter "T" of "the chat room" wandering in the ASCII art.
   UILabel now always renders left-aligned (paragraphStyle and
   textAlignment) and visual centering is the SwiftUI .frame's job.

3. SF Mono renders Block-Elements (▗▄▖▝▀▘▙▟ etc.) at slightly
   different pixel widths than ASCII spaces, so 85-char rows of
   mixed content didn't end up the same width. Bundled JetBrains
   Mono (Apache 2.0/OFL, Regular + Bold, ~270KB each) for the
   monospace renderer — every glyph in the file has advance=600
   confirmed via fontTools, matching what Android already uses
   (MicronComposables.kt's `JetBrainsMonoFamily`). Falls back to
   the system font if the bundled one fails to load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(TCPClientWizard): mirror android server list, drop bootstrap split

Addresses PR review comments:
#64 (comment)
#64 (comment)

Replace the iOS community-server directory with the canonical Android
list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt.
Removes decommissioned / non-existent entries (RNS Amsterdam, RNS
BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland,
TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the
servers that are actually present on the network. i2p is dropped
entirely because iOS has no i2p transport.

Also collapse the "Bootstrap Servers" / "Community Servers" split in
TCPClientWizard into a single "Community Servers" section, since
Reticulum-Swift does not yet implement bootstrap-interface mode and
splitting them would mislead users into expecting bootstrap behavior.
The isBootstrap flag on the data model is preserved so the Android
table stays mirrorable.

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* feat(auto-announce): granular trigger toggles + new wiring

Splits the auto-announce path into three independently-toggleable
triggers, all gated behind the existing `auto_announce_enabled` master:

  - `auto_announce_on_interval`       — periodic timer (existing)
  - `auto_announce_on_tcp_reconnect`  — fires on TCP / RNode reconnect
  - `auto_announce_on_peer_spawned`   — fires when AutoInterface / BLE /
                                        MPC accepts a new peer

All three default true to preserve the previous "all triggers active
when master is on" behaviour.

Wiring:
  - `AppServices.configureTransportCallbacks` now uses
    reticulum-swift's split callbacks (`setOnInterfaceConnected` /
    `setOnInterfacePeerSpawned`), each with its own user-setting gate.
    The polled state-observer's connect-trigger is gated to match.
  - `AutoAnnounceManager.start` (and the in-loop re-check) honour the
    `auto_announce_on_interval` toggle in addition to master.
  - `autoAnnounce()` itself bails on master-off as defense in depth.
  - SettingsView's Auto Announce card grows three sub-toggles +
    interval picker hides when the on-interval trigger is off.

Pairs with reticulum-swift's onInterfaceAdded → onInterfacePeerSpawned /
onInterfaceConnected split (see that repo). Ship-ready behaviour change
on its own; no diagnostic logging in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump reticulum-swift pin to 0.2.4

Picks up the onInterfaceAdded → onInterfacePeerSpawned/onInterfaceConnected
split (reticulum-swift PR #14) that this PR's wiring requires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(AppServices): only resetTimer when announce was actually sent

The polled state-observer's connect path was calling
`autoAnnounceManager.resetTimer()` unconditionally — even when the
TCP-reconnect gate had blocked the announce. Because `resetTimer()`
restarts the periodic loop with a fresh `Next auto-announce in 3h
(±1h)` schedule, every TCP reconnect on a flap-y network (mobile
data ↔ WiFi, RNode in poor RF) would push the next interval-announce
a full interval into the future without ever emitting one. The
periodic schedule could be perpetually starved even though the user
left "On interval" enabled and only disabled the reconnect trigger.

Move the `resetTimer()` call inside the gate so it only fires when an
announce actually went out.

Greptile review feedback on PR #70.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(auto-announce): extract AutoAnnouncePolicy + cover trigger gates

The auto-announce trigger gates were inlined as `defaults.bool(forKey: ...)`
calls at seven sites across AppServices and AutoAnnounceManager, which
made them impractical to unit-test without bringing up the full
AppServices stack (transport, identity, router, …).

Extract the gating decision into a pure value type, AutoAnnouncePolicy,
that snapshots the four UserDefaults keys and exposes:
  - shouldFireOnInterval
  - shouldFireOnTcpReconnect
  - shouldFireOnPeerSpawned

…all derived from the master enable plus the corresponding granular
toggle. Routes the seven existing call sites through the policy so the
inline string-key reads no longer appear in service code (which makes a
typo-rename harder and gives every gate the same code path).

Tests in AutoAnnouncePolicyTests cover:
  - Direct init stores all four flags.
  - Master off suppresses all three triggers regardless of granulars.
  - Each granular toggle gates its own trigger independently.
  - All-on / all-off boundary cases.
  - Empty defaults reports all-off (raw read behavior).
  - Snapshot is immutable after capture (catches future refactors that
    might keep a defaults reference).
  - register(defaults: true) produces the fresh-install all-fire baseline
    that SettingsViewModel.loadLocalSettings sets up.
  - Explicit false overrides registered default-true.

9 tests, all passing locally on iOS Simulator. Total suite went from
71 to 80 tests; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auto-announce): attribute peer-child connected events to peer-spawned gate

Reticulum-swift fires `onInterfacePeerSpawned` when an AutoInterface /
BLEInterface / MPCInterface accepts a peer, then a moment later fires
`onInterfaceConnected` for the peer's child transport's `.connected`
transition. The previous gating treated the second event as a generic
TCP-reconnect, so a user who turned the peer-spawned toggle off but
left tcp-reconnect on would still get an announce on every peer-add —
defeating the purpose of having a separate peer-spawned gate.

Changes:

  - `AutoAnnouncePolicy.shouldFireOnInterfaceConnected(isPeerChild:)`
    new accessor that gates by `onPeerSpawned` for peer-children and
    `onTcpReconnect` for everything else (both still subject to
    `masterEnabled`).
  - `AppServices` tracks ids passed through `onInterfacePeerSpawned` in
    a `peerChildInterfaceIds` set, then queries it in the
    `onInterfaceConnected` handler to pick the right gate.
  - Diagnostic log line distinguishes the two attribution paths so a
    future investigation can tell whether an announce came from the
    tcp-reconnect or peer-child-reconnect branch.

Tests cover the four corners of the cross-trigger matrix plus the
master-off override:

  - peer-child + peer-spawned-off + tcp-reconnect-on   → does NOT fire
  - peer-child + peer-spawned-on  + tcp-reconnect-off  → fires
  - non-peer-child + tcp-reconnect-on / off            → fires / not
  - master off                                         → never fires
  - all-on / all-off across peer-child boundaries

Greptile review feedback on PR #70 (4/5 confidence comment about peer-child overlap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auto-announce): make peer-child attribution race-free

The peer-spawned and connected callbacks fire from independent
reticulum-swift Tasks. The previous implementation used MainActor-
isolated record / lookup, which meant both operations had to await an
actor hop. Swift's task scheduler doesn't guarantee record-before-lookup
ordering between unrelated Tasks, so a fast peer-add → child-connect
sequence could in theory mis-attribute the connected event to
tcp-reconnect instead of peer-spawned (the user-facing bug
fixed in the prior commit).

Replace the MainActor-isolated Set with a synchronous, lock-protected
PeerChildInterfaceRegistry (OSAllocatedUnfairLock-backed). The peer-
spawned closure now records on its first line, *before* any await
suspension, so the record is committed before any subsequent
onInterfaceConnected for the same id can possibly run its attribution
lookup. The connected closure's lookup is also synchronous, so
attribution is correct regardless of how the schedulers interleave the
rest of the closure bodies.

Tests:
  - PeerChildInterfaceRegistryTests: empty / record-then-contains /
    idempotent / reset / immediate-visibility on same thread.
  - testConcurrentRecordAndContainsObservesAllPriorRecords: 1000-way
    concurrent record+contains stress, asserts no crash and full
    visibility after group completes.

Total suite: 90 tests, all passing.

Greptile review feedback on PR #70 (4/5 confidence comment about Task
ordering between MainActor hops).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 2, rejected 0

Snapshot dictionary keys before mutating during iteration in
PacketTunnelProvider:

- applyConfigsLocked() stale-entry teardown: collect stale ids via
  filter() before the loop instead of iterating currentTCPs.keys
  while teardownTCPConnectionLocked + removeValue mutate it.
- wake() reaper: iterate Array(self.tcpConnections.keys) instead of
  the live Keys view while teardownTCPConnectionLocked mutates the
  same dictionary.

Both paths run on configQueue (the only mutator), but Swift's
Dictionary.Keys is documented as a live view and mutation during
iteration is undefined behavior — can silently skip entries or
crash. Both fixes are inert for the single-TCP case but matter as
soon as 2+ TCPs are active and a config-change or wake event fires.

Co-Authored-By: Claude opus-4-7-1m <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 1, rejected 0

Roll back tcpInterfaces[entityId] and defer tcpEndpoints[entityId] until
after transport.addInterface succeeds. Without this, a transient
addInterface throw left both dictionary entries populated for a dead,
un-attached interface; the next connectTCPInterface call with the same
endpoint hit the idempotency guard at the top of the function and
silently no-op'd, breaking self-healing reconnects until the user
manually edited host/port.

Greptile thread 2 (the matching skip in InterfaceManagementViewModel.
applyChanges) is satisfied by this same fix — once tcpEndpoints reflects
only successfully-applied endpoints, the VM's
`tcpEndpoints[id] == desired` guard correctly distinguishes "running
cleanly" from "stale dead entry waiting to retry".

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>

* chore(greptile): iteration 2 — applied 1, rejected 0

Extend the connectTCPInterface write-after-success + rollback pattern to
the three remaining tcp-server init sites: both initialize() overloads
and reinitializeConnection(). Without this, an addInterface throw during
init left tcpInterfaces["tcp-server"] and tcpEndpoints["tcp-server"]
populated with a dead interface; reconnectTCPOnly delegates to
connectTCPInterface(entityId: "tcp-server", ...) which then silently
no-op'd on a same-address retry through the new idempotency guard.

For the two initialize overloads, the catch block preserves the
"non-fatal" semantics (init proceeds without TCP, no rethrow) but now
also clears the partial dictionary writes so a later reconnectTCPOnly
retry isn't stuck. For reinitializeConnection — which had no catch and
propagates errors to its caller — the new do/catch rolls back and
rethrows, mirroring connectTCPInterface.

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>

* feat(Map): follow app dark mode for OpenFreeMap style

Picks the OpenFreeMap style URL (liberty / dark) based on
ThemeManager.isDarkMode and reapplies it from updateUIView when
the active scheme changes. Coordinator caches the last applied
URL to skip the no-op reassignment that would otherwise fire on
every peer-location tick.

Offline regions remain pinned to the liberty style at download
time; switching to dark while fully offline yields unstyled
tiles. To be addressed in a follow-up that caches both style
packs.

Closes #59

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* Update Sources/ColumbaApp/Views/Map/MapLibreMapView.swift

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* chore(greptile): iteration 1 — applied 4, rejected 0

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* feat(InterfaceManagement): add TCP client community-server wizard (#64)

* feat(InterfaceManagement): add TCP client community-server wizard

Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding
add-interface surface: server selection (bootstrap/community/custom) →
review & configure. Routes Settings → Network Interfaces → + → TCP Client
through the wizard instead of the blank manual entry sheet, and reroutes
edit-existing for TCP entries to the same flow with pre-filled values.

Scoped to the fields TCPClientConfig already supports (host, port,
networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred.

Closes #51

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(TCPClientWizard): mirror android server list, drop bootstrap split

Addresses PR review comments:
#64 (comment)
#64 (comment)

Replace the iOS community-server directory with the canonical Android
list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt.
Removes decommissioned / non-existent entries (RNS Amsterdam, RNS
BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland,
TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the
servers that are actually present on the network. i2p is dropped
entirely because iOS has no i2p transport.

Also collapse the "Bootstrap Servers" / "Community Servers" split in
TCPClientWizard into a single "Community Servers" section, since
Reticulum-Swift does not yet implement bootstrap-interface mode and
splitting them would mislead users into expecting bootstrap behavior.
The isBootstrap flag on the data model is preserved so the Android
table stays mirrorable.

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 4, rejected 0

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(TcpCommunityServer): remove unwanted servers from wizard list

The following entries should not be surfaced in the on-device wizard:

- interloper node + interloper node (Tor)
- Jon's Node
- Quortal TCP Node
- R-Net TCP
- RNS bnZ-NODE01, RNS COMSEC-RD, RNS HAM RADIO
- RNS Testnet StoppedCold
- RNS_Transport_US-East
- Tidudanka.com

Surviving list: 3 bootstrap-class (Beleth RNS Hub, Quad4 TCP Node 1,
FireZen) + 7 community (g00n.cloud Hub, noDNS1, noDNS2, NomadNode
SEAsia TCP, 0rbit-Net, Quad4 TCP Node 2, SparkN0de).

NOTE: the file's docstring claims this list mirrors Android's
`TcpCommunityServer.kt`. Pruning here breaks that mirror; a follow-up
PR should make the equivalent removal on the Android side, OR the
"keep in sync" claim should be relaxed to "originally derived from."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com>
Co-authored-by: torlando-agent[bot] <torlando-agent@noreply.github.com>

* feat: add Maestro UI flows for columba-suite ui-screenshotter (#69)

* feat: add Maestro UI flows for columba-suite ui-screenshotter agent

Adds flows/ with 4 deterministic Maestro flows (contacts-list, chats-list,
settings, map) plus a README. The columba-suite ui-screenshotter agent
captures each flow at BASE_REF and HEAD in both light and dark Simulator
appearances on every UI-touching PR, linking the resulting PNG pair from
PLAN.md so reviewers see the visual change before merging.

This PR exists primarily to land flows/ on main so subsequent PRs have
flow coverage at BASE_REF. The screenshotter will fire on this PR itself,
but cleanly skip with screenshot_status: skipped_no_flows because the
PR's BASE_REF (this branch's parent) doesn't yet have flows/.

Voice-call flows are deferred — they need a debug-only lxma://debug/...
URL handler that doesn't exist yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 1, rejected 2

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <217870594+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>

* chore(test): add debug-only iOS test surface for phone smoke-test pipeline

Mirror of the Android `app/src/debug/.../TestController.kt` +
TestReceiver.kt surface, adapted to iOS via a sibling URL scheme
(`lxma-test://`) routed through the existing `.onOpenURL` handler in
ColumbaApp.swift. The 17 actions, log shape (`event=key=value`), and
whitespace-escape rules match Android byte-for-byte so the python
orchestrator's regexes work cross-platform.

- Sources/ColumbaApp/Test/TestController.swift — singleton coordinating
  the test-action surface; binds to live AppServices/router/interface
  repository, observes inbound LXMF + delivery-state via a relay
  delegate, emits structured os_log lines under subsystem
  `network.columba.app.test` / category `harness` so idevicesyslog
  filters cleanly.

- Sources/ColumbaApp/Test/TestURLHandler.swift — `lxma-test://<action>?<query>`
  dispatcher; mirrors Android's TestReceiver `when (action)` switch,
  routes to TestController. Wired into ColumbaApp.swift's `.onOpenURL`
  with a `#if DEBUG` guard.

- Both files are wrapped in `#if DEBUG` so they compile out of release
  `.ipa`s. Defense in depth: every entry trips an `assertionFailure`
  with a release-misconfig message. Verified empirically — release
  build's binary contains zero references to TestController /
  TestURLHandler / harness log strings.

- `lxma-test` URL scheme registered in Info.plist alongside `lxma`. The
  scheme stays present in release builds (no per-config plist on this
  project) but is harmless because no code in release handles it; the
  release `.onOpenURL` `#if DEBUG` block compiles to a guard-pass and
  the URL falls through.

The Python orchestrator at ~/.claude-runner/columba-harness/smoke_test_ios.py
drives this surface end-to-end (devicectl URL dispatch + idevicesyslog
tail) and is the iOS sibling of smoke_test.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test-harness): unbreak release-guard + add file-based event log

Two bugs that prevented end-to-end smoke runs against a physical iPhone:

1. assertionFailure_releaseGuard() was calling assertionFailure(...)
   UNCONDITIONALLY in both TestController.swift and TestURLHandler.swift.
   That's exactly inverted from the intent — `assertionFailure` ALWAYS
   crashes in DEBUG builds. So every URL dispatch and every public
   handler entry crashed the app on the guard before any logic ran.

   Mirrors the Android side's `check(BuildConfig.DEBUG)` semantics:
   crash only when DEBUG is FALSE. New impl wraps the body in
   `#if !DEBUG ... #endif` so it's a no-op in normal debug builds and
   a hard crash if a release ever gets misconfigured to compile this
   file in.

2. TestLog.emit() now ALSO writes each line to
   `Documents/test_log.txt`, prefixed `seq=<n> ts=<iso8601>`. Reason:
   the Python orchestrator originally tailed device syslog via
   `idevicesyslog`, but iOS 17+ moved live-syslog behind the new
   CoreDevice / RemoteXPC tunnel that libimobiledevice can't speak.
   `pymobiledevice3` would work but needs a developer-tunnel daemon.
   The orchestrator now polls Documents/test_log.txt via
   `xcrun devicectl device copy from --domain-type appDataContainer`,
   which works out of the box and is more robust (no race window,
   survives disconnects). os_log writes are kept for human readers.

Verified end-to-end: smoke_test_ios.py runs the propagated_bidirectional
scenario all the way through interface setup, propagation-node config,
HAS_PATH=1, SEND_PROP, msg_sent. (Stalls at OUTBOUND-never-advances-to-
PROPAGATED — separate LXMFSwift outbound state-machine issue, NOT a
harness bug. Diagnostic for that lands in a follow-up.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(harness): add lxma-test://dump_log for OSLogStore extraction

iOS 17+ moved live syslog behind the new CoreDevice / RemoteXPC
tunnel that libimobiledevice can't speak, so the smoke harness
couldn't observe library-internal events on the device. Added a
debug-only `dump_log` URL action that uses OSLogStore to extract
recent unified-log entries from the app process and forwards them
into Documents/test_log.txt as `lib_log subsys=… cat=… level=… msg=…`
lines that the orchestrator can parse with its existing devicectl
copy-from poll mechanism.

Filter defaults to `(com.columba.core, net.reticulum.lxmf)` ×
(Propagation, Sync, LXMRouter, Stamper, Identity, PropagationNodeManager)
to surface just the propagation-path observability we need to
diagnose stuck `state=OUTBOUND` failures. `?since=<sec>` sets the
window (default 120s); `?cat=<comma>` overrides categories; `?cat=*`
disables category filtering.

Critical first finding when wired up: processOutbound IS running and
calling sendPropagated; the failure is `LXMRouter` emitting
"Delivery failed: No path available to destination, retrying in 15s/120s"
because `pathTable.lookup(destinationHash: nodeHash)` returns nil for
the propagation node hash even though `pathTable.hasPath(for:)`
returns true on the same hash from the harness. Likely actor-
isolation race or stale-snapshot bug in the path-table view; needs
deeper investigation in LXMF-swift / reticulum-swift.

Sticks to existing test-surface contract — `lib_log_done count=<n>` /
`lib_log_err reason=<msg>` reply tokens; debug-only via the existing
`#if DEBUG` source-set isolation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(harness): wire iOS PROPAGATED smoke end-to-end

Three bug-fix-and-instrument changes to make the PROPAGATED self-send
round-trip pass on iOS. Mirrors the Android smoke pipeline shipped in
PR #882.

1. TestRelayDelegate retention. LXMRouter holds the delegate weakly
   (LXMRouter.swift `weak var delegate`); attachDelegate handed in a
   stack-local relay that immediately deallocated, leaving the router
   with a nil delegate and no didUpdateMessage callbacks for outbound
   state changes. Pin the relay to TestController.attachedDelegate.

2. set_prop_node now goes through PropagationNodeManager.selectNode
   (via TestPathBridge.selectPropNode) instead of router.setOutboundPropagationNode.
   The manager is the only path that wires the announce-derived stamp
   cost into the router; the bare router setter left cost=0 and
   sendPropagated shipped a random stamp that lxmd rejected with
   ERROR_INVALID_STAMP. selectNode also now (a) reads stamp cost from
   pathTable.appData when knownNodes is empty and (b) waits up to ~5s
   for either source to populate, covering the smoke-test race where
   set_prop_node fires immediately after add_tcp_client (before the
   announce arrives).

3. PropagationNodeManager.processPathEntry re-applies the stamp cost
   to the router whenever an announce updates the currently-selected
   node, so a delayed announce can correct an earlier cost=0 setting.

Plus instrumentation: dump_log now emits each OSLog entry's actual
recorded timestamp (`entry_ts=`) alongside the dump-time `seq=N ts=`
prefix, and includes `network.columba.Columba` in the allowed-subsystem
set so app-side managers (PropagationNodeManager) show up.

Direct + opportunistic self-send scenarios are still WIP — they
require LXMRouter-level loopback for self-addressed packets (single
device can't actually transit a packet to itself through the network)
which is a future stage. PROPAGATED works today via the lxmd round-trip.

* chore: bump LXMF-swift to a3e5b00 (DIRECT identify-drop fix)

* chore(deps): pin reticulum-swift to fix/link-data-no-header2-conversion

reticulum-swift @ d19919a — drops incorrect HEADER_2 conversion of link
DATA packets that broke multi-hop DIRECT delivery (state=SENT but the
echo bot never received the message). Mirrors python RNS/Transport.py
:1063, 1122-1130 — link DATA always sends HEADER_1 to the link's
attached_interface, never through path-table lookup.

LXMF-swift @ fe3ce84 (perf/stamper-parallel-primed-digest) — pins
reticulum-swift to the same fix branch.

Smoke results after fix (today's run #5):
  propagated_bidirectional: PASS (6.7s)
  direct_echo:              PASS (3.5s)  ← was FAIL pre-fix
  opp_echo:                 PASS (3.4s)

* test(harness): add diagnostic ticker + screenshot capture to TestController

Spawned by TestController.bind() on first init; runs every 2s for the
app's lifetime, snapping the key window into Documents/screenshots/<seq>.png
and emitting:

  diag_tick seq=N state=<active|inactive|background> snapshot=<path|<skip>>
  lifecycle event=<did_become_active|will_resign_active|...>

Diagnoses the iOS smoke harness wedge: "lxma-test:// URLs stop reaching
the URL handler after 2-3 sequential runs." The ticker is driven by an
internal Task, NOT URL dispatch, so it keeps emitting even when URLs are
wedged. If ticks ALSO stop, the OS suspended/killed the app. If ticks
keep coming with state != .active, the app went background. If ticks
keep firing AND state stays .active but URLs still don't reach the
handler, the wedge is below SwiftUI (CoreDevice tunnel / launch
services). Last is the smoking gun pattern.

Field finding from this commit's first run (2026-05-10):
  iter 1: 3/3 PASS
  iter 2: 3/3 PASS
  iter 3: 0/3 FAIL — "TCP client interface ADD never confirmed"
  iter 4: total wedge — TestController never answered get_dest

After the wedge, even `devicectl device copy from` hangs for 30+s,
which proves the wedge is at the **CoreDevice tunnel layer**, not the
app's URL handler. The iPhone-side dev tunnel (RemoteServiceDiscovery)
goes degraded after rapid `process launch --payload-url` bursts.
Recovery: pkill devicectl + relaunch app via process launch (which
still works because process control rides a different RSD service).

Screenshots written to Documents/screenshots/, capped at 30 most-recent.
Pull via `xcrun devicectl device copy from --domain-type
appDataContainer --domain-identifier network.columba.Columba --source
Documents/screenshots --destination /tmp/...`.

#if DEBUG-only — does not ship in release, same as the rest of the
test surface.

* fix(prop): single checkmark + 'sent to relay' text + dump_db diag

LXMF-swift bump → b2e14cd: caps PROPAGATED outbound state at .sent
(per python LXMessage.py:568-578); large prop messages no longer
falsely advance to .delivered via the Resource path.

iOS UI:
- MessageBubble.deliveryStatusIcon: defensively coerce
  delivered/read → sent for any message with deliveryMethod ==
  'propagated' (handles stale rows from before the fix).
- MessageDetailView.statusCard: method-aware text for prop messages.
  'Sent' → 'Sent to relay' with subtitle explaining propagation
  nodes don't ack recipient receipt.

Diagnostic surface:
- New lxma-test://dump_db URL action. Walks the full
  conversations + messages tables, emits one line per row to
  test_log.txt. Diagnoses Tyler's 2026-05-10 observation that
  prop messages appear in a separate conversation from
  direct/opp — DB inspection is the source of truth (UI
  faithfully renders whatever conversations table has).

Refs:
- LXMF/LXMessage.py:568-578 (__mark_propagated → state=SENT)
- LXMF-swift b2e14cd (resource-handler split, port-aligned)

* chore(deps): bump LXMF-swift to 0.4.0 + reticulum-swift to 0.3.0

LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged):
  - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed
    SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone.
  - PROPAGATED state machine fixes: drops wrong link.identify(); wires
    RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler
    via pendingPropagationSends FIFO + pendingPropagationRejections
    set; handlePropagationAccepted + handleOutboundResourceFailed with
    awaited DB writes that preserve deliveryAttempts budget.
  - DIRECT path: self-send identity resolution before path table;
    drops premature link.identify(); broadcast-relay-only self-echo
    gate; DIRECT resource crash-recovery parity with PROPAGATED.
  - Stamp-rejected resource short-circuit prevents retry-loop spam.

reticulum-swift 0.3.0 (PR #16):
  - HEADER_2 link DATA conversion fix.
  - sendLinkData signature: destinationHash param removed (breaking).

Package.swift, pbxproj, and Xcode-shared Package.resolved all updated.
Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO,
BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional
with Mac echo bot) to follow on PR ready→draft transition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump LXMF-swift to 0.4.0 + reticulum-swift to 0.3.0 (#73)

LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged):
  - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed
    SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone.
  - PROPAGATED state machine fixes: drops wrong link.identify(); wires
    RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler
    via pendingPropagationSends FIFO + pendingPropagationRejections
    set; handlePropagationAccepted + handleOutboundResourceFailed with
    awaited DB writes that preserve deliveryAttempts budget.
  - DIRECT path: self-send identity resolution before path table;
    drops premature link.identify(); broadcast-relay-only self-echo
    gate; DIRECT resource crash-recovery parity with PROPAGATED.
  - Stamp-rejected resource short-circuit prevents retry-loop spam.

reticulum-swift 0.3.0 (PR #16):
  - HEADER_2 link DATA conversion fix.
  - sendLinkData signature: destinationHash param removed (breaking).

Package.swift, pbxproj, and Xcode-shared Package.resolved all updated.
Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO,
BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional
with Mac echo bot) to follow on PR ready→draft transition.

Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tunnel): guard applyTunnelModeToInterfaces(active:false) against initial .invalid VPN state

iOS emits an `.invalid` / `.disconnected` VPN status notification on
every cold start — fired by `TunnelManager.onStatusChange` regardless
of whether the user has enabled Background Transport, because the
session machinery probes whatever is currently loaded. The previous
code unconditionally scheduled `applyTunnelModeToInterfaces(active:
false)` via the 5s debounce, which iterated every TCPInterface and
called `endTunnelMode()`.

`endTunnelMode()` in reticulum-swift 0.3.0 is NOT idempotent
(TCPInterface.swift:257-269): it unconditionally tears down the
working NWConnection (via `transport?.disconnect()` -> nil) and
re-runs `setupTransport()`. Calling it on an interface that was never
in tunnel mode (outboundHook == nil) is destructive — it kills the
live socket Step 7 brought up moments earlier.

Reproduced 2026-05-11 on smoke run iter1 against
`feat/multi-tcp-tunnel @ 0f7cf3e`: all 4 scenarios FAILED at the
earliest `send_*` step. has_path returned 1 for both PN and bot
(path table populated via inbound announces), but outbound sends
never advanced past `state=OUTBOUND`. Console showed `[TUNNEL]
disabled tunnel mode` ~5s after cold start with no prior
`[TUNNEL] enabled` line, confirming the debounce was tearing down
TCP without ever having activated it.

Fix tracks an `isTunnelModeActive` bool. The active=false branch
guards on it and returns early if tunnel mode was never activated.
Mirrors the "undo what you did" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com>
Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: torlando-agent[bot] <torlando-agent@noreply.github.com>
Co-authored-by: torlando-agent[bot] <217870594+torlando-agent[bot]@users.noreply.github.com>
…ip-flag

# Conflicts:
#	Columba.xcodeproj/project.pbxproj
#	Sources/ColumbaApp/Models/TcpCommunityServer.swift
#	Sources/ColumbaApp/Services/AppServices.swift
@torlando-tech
Copy link
Copy Markdown
Owner Author

Smoke status: 4/4 PASS (smoke_clean) — backgrounded delivery gate met

After merging current main into this branch (ed89272):

scenario result duration
propagated_echo PASS 22.0s (sync_attempts=2)
direct_echo PASS 4.0s
opp_echo PASS 4.0s
backgrounded_propagated PASS 83.0s

Identical green shape to PR #62's final iter 3 — main-into-branch merge was functionally inert. The c0d2213 isTunnelModeActive guard (fix for the cold-start tunnel-disable bug that was killing every TCP NWConnection ~5s after Step 7 brought it up) is preserved.

This branch now passes the Phase 3 (backgrounded delivery) smoke gate. PR is review-ready.

🤖 Generated with Claude Code

@torlando-tech
Copy link
Copy Markdown
Owner Author

Smoke status update: 5/5 PASS — Phase 4 doze gate also clean

Re-ran the smoke harness with a new doze_propagated scenario added (Phase 4 — task #82).

scenario result duration notes
propagated_echo PASS 20.1s sync_attempts=2
direct_echo PASS 4.6s
opp_echo PASS 4.7s
backgrounded_propagated PASS 83.0s terminate + 60s wait + sync_prop drain on resume
doze_propagated PASS 309.1s background (don't terminate) + 301s wait + foreground without terminate + sync_prop drain in 1 attempt

doze_propagated exercises a softer suspension than backgrounded_propagated — it backgrounds the app via Safari foreground (does NOT kill the process), waits 5 minutes (well past iOS's ~30s suspension threshold), then foregrounds Columba via device process launch BUNDLE_ID (no --terminate-existing). Echo was retrieved in a single sync_prop attempt post-resume.

Could be either (a) NE kept the TCP socket alive through the 5min suspension, or (b) iOS killed the connection and the app re-established it + ran sync_prop on resume. Distinguishing requires broader idevicesyslog capture filtering the ColumbaNetworkExtension subsystem — filed as a follow-up. Either way, backgrounded delivery survives a real 5-minute iOS suspension.

🤖 Generated with Claude Code

…orkaround

reticulum-swift 0.3.1 (PR #17) makes `TCPInterface.endTunnelMode()`
and `AutoInterface.endTunnelMode()` idempotent via an `outboundHook
!= nil` guard. That moves the contract upstream, so the
`isTunnelModeActive` bool guard added in `c0d2213` is no longer
necessary — the `endTunnelMode()` calls in
`applyTunnelModeToInterfaces(active: false)` are now safely no-ops
when fired on never-tunneled interfaces (e.g. the initial `.invalid`
VPN-status notification on every cold start).

Removed:
  - `isTunnelModeActive` field declaration + doc
  - `isTunnelModeActive = true` write in the active=true branch
  - `isTunnelModeActive = false` write in the active=false branch
  - The `guard isTunnelModeActive else { return }` short-circuit

Build verified: xcodebuild for iOS Simulator BUILD SUCCEEDED.

The port-deviations.md note for reticulum-swift's tunnel API spelled
out that this Columba-iOS workaround should be deleted on the next
deps bump — this is that deps bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift
…d alone is insufficient

Smoke iter 1 against `1ee72eb` (reticulum-swift 0.3.1 bump + workaround
removal) failed all 5 scenarios with the same OUTBOUND-forever shape
the workaround was suppressing. Reverted just the Columba-side
workaround removal here as an A/B test — same HEAD otherwise (0.3.1
deps preserved). If smoke goes 5/5 again on this commit, it proves
0.3.1's upstream `outboundHook != nil` guard is necessary but not
sufficient; the Columba workaround was suppressing something the
upstream check doesn't catch.

Diag.log from the failing iter shows `[TUNNEL] disabled tunnel mode`
fires at +5s cold-start (the .invalid debounce expiring) but Step 7
reports "starting 0 enabled interfaces" before that, meaning
`tcpInterfaces` is empty when the disable iterates. So whatever the
workaround was suppressing isn't `endTunnelMode()` being called on
live interfaces — it's something else in the same code path or a
related side effect. Investigation continues; the workaround stays
in until the actual mechanism is identified.

This restores the `isTunnelModeActive` field, the `= true` write in
the active branch, and the `guard isTunnelModeActive else { ... }`
short-circuit in the inactive branch. reticulum-swift 0.3.1 is kept
(`Package.swift` / pbxproj minimumVersion / Package.resolved
unchanged) — the upstream guard is still a correctness improvement
even if it isn't load-bearing for this specific Columba bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@torlando-tech
Copy link
Copy Markdown
Owner Author

Update: bumped reticulum-swift 0.3.1, kept isTunnelModeActive workaround

Smoke A/B diagnostic confirmed the upstream 0.3.1 idempotency guard alone is necessary but not sufficient to fix the OUTBOUND-stuck cold-start regression. Without the Columba-side isTunnelModeActive workaround, all 5 scenarios fail at the same shape; with the workaround restored on the same HEAD (0.3.1 deps), all 5 pass.

A/B results

HEAD reticulum-swift Columba workaround smoke
ed89272 0.3.0 present 5/5 PASS
1ee72eb 0.3.1 removed 0/5 FAIL
2ff0d10 0.3.1 restored 5/5 PASS

Decision

Keep the Columba-side workaround. The 0.3.1 dep bump is retained (it's a real correctness improvement at the API surface even if it isn't load-bearing for this specific bug). Investigation into what the workaround suppresses beyond the upstream guard is filed as a follow-up — suspects include the late-tunnel-check in connectTCPInterface, a side-effect of disable_all_interfaces, or a NEVPNStatusDidChange race during interface bringup.

PR head is now 2ff0d10 and smoke-clean. Ready for review/merge.

🤖 Generated with Claude Code

torlando-tech and others added 2 commits May 12, 2026 20:52
…ification scenario

Adds `lxma-test://get_notifications` URL action that queries
`UNUserNotificationCenter.deliveredNotifications` and emits one
`notif id=<id> thread=<id> delivery_ts=<iso> source_hash=<hex>
body=<preview>` line per delivered notification, bracketed by
`notif_begin count=N query_ts=<iso>` and `notif_end count=N`.

Used by the Phase A `suspended_notification` smoke scenario (in
`smoke_test_ios.py`) to assert whether a system-level notification
was posted while the app was suspended: compare each notification's
`delivery_ts` against the orchestrator's `T_foreground` wall-clock
to distinguish "delivered during suspension" (notification fired
from the extension, the goal) vs. "delivered post-foreground"
(app caught up by draining the queue, what the current "dumb pipe"
NE architecture produces).

The scenario is expected to FAIL on the current branch — that
failure IS the gate signal that Phase B (push destination-hash
filter + UNUserNotificationCenter call into ColumbaNetworkExtension)
hasn't shipped yet. Phase A's purpose is exposing the gap that the
existing smoke obscured by foregrounding before checking the DB.

Build verified: xcodebuild iOS Simulator BUILD SUCCEEDED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s + request_notif_permission)

Phase A smoke iter 3 showed `suspended_notif_count=0` AND
`post_foreground_notif_count=0` — ambiguous between "Phase B not done"
(expected catch-up-on-drain) and "iOS notification permission not
granted on test iPhone." Adding two new test-surface actions to
disambiguate up-front:

- `lxma-test://get_notif_status` — emits current iOS authorization
  state (`notDetermined` / `denied` / `authorized` / `provisional` /
  `ephemeral`) plus alert/badge/sound flags AND Columba's own
  `notifications_enabled` UserDefaults pref. Lets the scenario detect
  the permission-missing branch and fail with a clear
  `iOS notification permission not granted (auth=…)` message instead
  of "no notifications, cause unknown."

- `lxma-test://request_notif_permission` — calls
  `UNUserNotificationCenter.requestAuthorization` (iOS shows the
  system "Allow notifications?" prompt on first run) AND sets
  `notifications_enabled` + `notify_received_message` to true in
  UserDefaults so `NotificationService.postMessageNotification` won't
  short-circuit on the pref guard.

First run after a fresh install: orchestrator drives this URL, iOS
shows the system prompt, Tyler taps Allow once on the phone. From
then on the grant is persisted and the scenario runs unattended.

Build verified: xcodebuild iOS Simulator BUILD SUCCEEDED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@torlando-tech
Copy link
Copy Markdown
Owner Author

Phase A — Suspended-notification gap confirmed

Added a new suspended_notification smoke scenario + lxma-test://get_notifications test surface action. Iter 8 + iter 9 against a1a7fa1 produced consistent diagnostic data:

field iter 8 iter 9
notif_auth authorized authorized
notif_enabled_pref true true
suspended_notif_count 0 0 (rx_msg never arrived to even check)
post_foreground_notif_count 0 n/a (drain failed)

Architectural gap confirmed: zero notifications fire during the 90s suspension window in both runs. Matches Apple's documented behavior — NEPacketTunnelProvider is a "dumb pipe" in Columba's design, posts Darwin notifications when frames arrive, but Darwin notifications don't wake suspended apps. The host app's NotificationService.postMessageNotification only runs when the LXMF state machine processes the inbound frame, which only happens when the app is actually running.

Separate concern surfaced (filed as task #99): the post-foreground sync_prop drain isn't reliable after a Safari-foreground/Columba-foreground transition — TCP interface state may be degrading across the suspend cycle. Not blocking Phase B planning since the gap signal is independent.

Next: Phase B — push minimal dest-hash filter + UNUserNotificationCenter scheduling into ColumbaNetworkExtension so notifications fire from the extension's process while the host app is suspended. Crypto + full LXMF decode stay in the app.

🤖 Generated with Claude Code

torlando-tech and others added 5 commits May 13, 2026 01:35
… extension

Phase B of the suspended-app notification work. With Darwin notifications
unable to wake a suspended host app (Apple DTS forum 769398), `NotificationService`
never fired on inbound LXMF traffic until the user manually foregrounded
the app. This commit moves the minimum amount of Reticulum awareness into
the `NEPacketTunnelProvider` to fix that gap:

- `AppServices.publishLocalDestinations()` writes the
  `transport.registeredDestinationHashes()` set to App Group prefs and
  posts a Darwin reload notification. Called at the end of both
  `initialize` overloads and after `initializeBaseStack` — every place
  where a destination is freshly registered. `switchIdentity` delegates
  to the second `initialize` overload so identity switches are covered
  too.

- `PacketTunnelProvider` decodes the published hex hashes into a
  `Set<Data>` on `configQueue`, observes the reload Darwin notification,
  and consults the set in `handleTCPData` for every deframed packet.
  Matching packets get an `UNUserNotificationCenter` request posted under
  the host app's bundle identity so iOS shows a banner / lock-screen
  alert even while the host app is suspended.

The filter inspects only unencrypted header fields (header type byte +
destination_hash at offset 2 for HEADER_1 or offset 18 for HEADER_2,
verified against `Reticulum/RNS/Packet.py:Packet.unpack`); crypto and
full LXMF decode stay in the host app. Fires on DATA+CONTEXT_NONE
(OPPORTUNISTIC LXMF arrivals) and LINKREQUEST (DIRECT delivery
initiation, the only DIRECT-flow packet addressed to our delivery
hash); ANNOUNCE and PROOF are skipped.

Notifications inherit the host app's authorization grant — extensions
sit in the container app's notification domain (Apple DTS engineer
Quinn) — so no extension-side `requestAuthorization` is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `suspended_notification` smoke scenario needs the Network
Extension running across the host-app suspend window — otherwise the
inbound TCP socket dies as soon as iOS suspends the host process and
Phase B's destination filter never sees a frame.

Adds two `lxma-test://` actions that the harness can call to flip
Background Transport on programmatically (matching the Settings
toggle's behaviour byte-for-byte: it persists `tunnelEnabledKey` so a
cold restart auto-resumes the tunnel, then kicks
`TunnelManager.start()` and waits up to 30s for `.connected`).

  - `enable_tunnel`        — emits `tunnel_enable state=<state>`
  - `get_tunnel_status`    — emits `tunnel_status state=<state>`

`TestTunnelBridge` keeps the test surface ignorant of `TunnelManager`'s
real type (it only exists under `ENABLE_NETWORK_EXTENSION`), so the
file still compiles in build configurations where the extension is
turned off. The bridge closure lives in `TestURLHandler.bind`, guarded
by the same compile flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Settings toggle persists `tunnelEnabledKey` so a cold relaunch
auto-restarts the tunnel — that's the right shape for users. For the
test surface it's wrong: every subsequent smoke run cold-starts with
auto-tunnel-on, and the in-flight transition races the harness's
path-discovery bringup and breaks even baseline scenarios (msg stays
in OUTBOUND, never reaches PROPAGATED).

Drops the persistence write inside the test bridge. `TunnelManager.start()`
still runs and the tunnel is alive for the rest of the session, so the
suspend test still gets what it needs. Tests that need the tunnel call
this every run; the persisted flag stays off across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion prompt

When the user hasn't responded to the system notification prompt yet,
`UNUserNotificationCenter.requestAuthorization` doesn't return until
they tap Allow / Don't Allow — and awaiting that during the cold-start
init loop held the rest of init hostage. Concretely: no
`TestURLHandler.bind`, no MainTabView, no `isInitialized = true`. The
app's loading screen stays up indefinitely behind the system sheet.

Fire-and-forget the permission request so the rest of init can proceed
in parallel. Users still see the prompt the first time they launch a
fresh install — they just don't need to dismiss it before the app
becomes usable. The matching `userNotificationCenter.delegate`
assignment is part of `requestPermission()`, so it's still installed
(asynchronously) and foreground notification suppression for the
active conversation continues to work the next message after grant.

Also unblocks the smoke harness on fresh-install devices — it
previously got stuck at `dest_err reason=not_ready` because
TestController.bind never ran while the OS prompt was up and there's
no way for devicectl to tap "Allow" remotely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `lxma-test://skip_onboarding[?host=&port=&name=]` so a fresh
install (or any device where `has_completed_onboarding` is false)
can be brought to the smoke-testable state without manual tap-through
of the OnboardingView. Mirrors `OnboardingViewModel.skipOnboarding`
exactly: creates an anonymous identity via `IdentityManager`, switches
to it, registers a TCP-client interface, and flips
`has_completed_onboarding` + `settings_initialized` +
`notifications_enabled`.

Self-contained in `TestURLHandler` (not `TestController`) because
`TestController.bind` requires `AppServices` to be initialized, and
that hasn't happened yet on a fresh install — the test surface needs
to bootstrap state *before* AppServices has anything to bind to.
`IdentityManager` and `InterfaceRepository` are both safe to
instantiate standalone, so this works during the OnboardingView's
lifetime.

Idempotent: if an active identity already exists, no-ops on identity
creation and just reaffirms the onboarding flags + TCP config.

The host app's `@State showOnboarding` is decided at init time so the
caller must force-terminate + relaunch the app after this returns ok
before the new state takes effect. Default host/port match the
columba-harness defaults (10.0.0.145:4242, name=test_mac).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/Shared/SharedFrameQueue.swift
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift
torlando-tech and others added 2 commits May 13, 2026 21:05
…iewModel.loadLocalSettings

The user-facing default-value registration for `auto_announce_enabled`,
`auto_announce_on_tcp_reconnect`, `notifications_enabled`, etc., was
inside `SettingsViewModel.loadLocalSettings()` — which only runs when
the user opens the Settings UI. Fresh installs that never visit
Settings silently had every one of those keys defaulting to `false`
at the raw `UserDefaults.bool(forKey:)` level, because
`register(defaults:)` had never run.

Concrete symptom: `AppServices.configureTransportCallbacks`'s
`onInterfaceConnected` hook calls `AutoAnnouncePolicy.current()`,
sees `masterEnabled = false`, and logs `[AUTO_ANNOUNCE]
onInterfaceConnected(...) — master toggle off, skipping`. The phone
never auto-announces on TCP reconnect, so rnsd loses the phone's
path the moment the TCP socket cycles. From the bot side, this
manifests as `Got packet in transport, but no known path to final
destination <phone-hash>. Dropping packet.` Every bot→phone
DIRECT/OPP delivery silently drops because rnsd has nothing to
route to.

Lifts the registration to a static
`SettingsViewModel.registerLocalDefaults(into:)` method, called
from `ColumbaApp.init()` before `AppServices` reads any of the
keys. `register(defaults:)` only sets fallbacks for keys without
explicit values, so it remains harmless to call from
`loadLocalSettings()` too (which still does, so the view is
self-sufficient in isolation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t (task #96)

iOS routinely reports the VPN-status sequence
`.connecting → .connected → .reasserting → .connected` during routing
setup, which fires our `onStatusChange` handler twice for `.connected`.
The `active: false` branch already guarded against this via
`isTunnelModeActive`; the `active: true` branch did not. Each redundant
`.connected` callback then called `beginTunnelMode` on every
`TCPInterface` again, which re-installs the outbound hook and resets
the transport pointer — racing any in-flight LXMessage send. The
visible symptom in the diag is the matching pair:

    [TUNNEL] enabled tunnel mode on N TCP interface(s); ...
    [TUNNEL] enabled tunnel mode on N TCP interface(s); ...

logged twice within the same second, followed by the LXMF state
machine stalling (e.g. a queued DIRECT send sits in OUTBOUND for
30+s before the bot eventually receives the LINKREQUEST).

Symmetric guard with the disable branch: bail with a noisy diag log
on the redundant `.connected` event.

Verified on-device: after this commit the diag shows exactly one
`[TUNNEL] enabled tunnel mode` followed by `[TUNNEL] skipping enable
— already active` for each tunnel-up transition, instead of two
unguarded enables. Mid-session `enable_tunnel` test-action call
behaves predictably afterwards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaApp/Services/AppServices.swift
Replaces the old tunnel-mode flip (where a single TCPInterface gave
up its app-owned NWConnection mid-session to route through the NE)
with a separate `TunnelTCPInterface` registered alongside the
foreground TCPInterface. Both connect to the same rnsd; rnsd sees
two clients with independent paths to <phone-hash>.

The old architecture had a fatal seam at the foreground-to-tunnel
handoff: the app-owned socket closed, rnsd removed the path entry
attached to it, and the extension's new socket had no announce yet
because `TCPInterface.beginTunnelMode` keeps state=.connected (no
notifyStateChange, so auto-announce-on-tcp-reconnect doesn't fire).
Bot→phone packets then bounced as `Got packet in transport, but no
known path to final destination <phone-hash>` for the entire suspend
window — Phase B's filter had nothing to fire on.

New architecture:

  * `Sources/ColumbaApp/Services/TunnelTCPInterface.swift` — new
    `NetworkInterface` implementation. Outbound: HDLC-frames data and
    calls `TunnelManager.sendFrame(..., entityId=TUNNEL_TCP_INTERFACE_ID)`.
    Inbound: receives via `ExtensionFrameReader`'s
    `onTCPFrameReceived` callback when the tag matches.

  * `AppServices.registerTunnelInterface()` — fires on tunnel
    .connected. Creates and registers the TunnelTCPInterface,
    mirrors the foreground TCP's host/port, publishes the endpoint
    to a new App Group key `tunnelTCPEndpointsKey`, then sends
    `sendAllAnnounces` (broadcast) followed by a 100ms-delayed
    tunnel-only re-announce. The tunnel-only follow-up pins rnsd's
    last-write-wins path table to the tunnel socket so the
    foreground socket dying on suspend doesn't strand the path.

  * `AppServices.deregisterTunnelInterface()` — fires on
    .disconnected / .invalid (5s debounce). Removes the interface
    from the transport and clears the App Group endpoint list.

  * `PacketTunnelProvider.loadInterfaceConfigs` — reads
    `tunnelTCPEndpointsKey` first. When present + non-empty, it's
    the only source of TCP entries; otherwise falls back to the
    legacy `interfacesKey` TCP parsing (preserves the multi-TCP
    tunnel commit's behaviour for older builds). Adds a Darwin
    observer for the matching `tunnelTCPEndpointsChangedNotificationName`
    so the extension reapplies without a tunnel restart.

  * `AppServices.connectTCPInterface` — no longer calls
    `beginTunnelMode` on newly-added foreground interfaces. They
    stay foreground-only. `applyTunnelModeToInterfaces(active:)` is
    orphaned (no callers); left in place for now alongside the
    `isTunnelModeActive` guard until a follow-up gut.

  * `AppServices.ExtensionFrameReader.onTCPFrameReceived` — only
    frames tagged `TUNNEL_TCP_INTERFACE_ID` route into the
    transport. Frames from any other entity ID get dropped, since
    the foreground TCPInterfaces receive their own inbound via
    their app-owned NWConnection — accepting the extension's
    duplicate would double-process every packet.

Verified end-to-end: the suspended_notification smoke scenario
posts a DIRECT message from a Mac-side pinger to the phone every
10s. When the host app is backgrounded, the tunnel TCP socket
stays alive, rnsd routes the ping via the tunnel path, the
extension receives the LINKREQUEST + DATA packets, and
`maybeScheduleNotification(for:)` matches each against
`localDestinationHashes` and posts a UN notification. Result:
`suspended_notif_count: 2` during a 30s suspend window. Phase B
is now validated end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Outdated
`testEmptyDefaultsReportsAllOff` assumed the registration domain
was per-`UserDefaults`-instance — that creating a per-suite scratch
defaults would isolate it from `register(defaults:)` calls made on
`.standard`. That assumption broke at `dc1024b` when the
`SettingsViewModel.registerLocalDefaults` call moved to
`ColumbaApp.init()` so the on-reconnect announce fires for fresh
installs that never touch Settings.

The XCTest host loads the @main App before running tests, so
`ColumbaApp.init` executes and registers the four `auto_announce_*`
toggles to `true`. NSUserDefaults' registration domain is shared
across every UserDefaults instance in the process — including
`UserDefaults(suiteName:)` scratch defaults — so the per-test
suite inherits the fallbacks.

Renamed the test to `testEmptyPerSuiteInheritsProcessWideRegistrationAsAllOn`
to reflect the actual contract being pinned: the app-init
registration *must* leak to all UserDefaults instances, because
that's exactly what makes the on-reconnect announce fire on a
fresh install. A future refactor that drops the app-init
registration call now fails this test loudly instead of silently
regressing every fresh install to no-auto-announce.

The two adjacent tests (`testRegisterDefaultsTrueProducesAllFireForFreshInstall`,
`testExplicitFalseOverridesRegisteredDefaultTrue`) still validate
the per-instance `register(defaults:)` mechanics + explicit-write
override semantics on the per-suite. Together the three tests
cover the full registration-domain contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@torlando-tech torlando-tech changed the title feat: enable Network Extension (Phase 2 of background-connectivity plan) feat(ne): enable Background Transport + Phase B suspended-notifications + dual-interface tunnel May 14, 2026
Harden NE-side port parsing in `loadInterfaceConfigs`. Both the
dual-interface (`tunnelTCPEndpointsKey`) path and the legacy fallback
(`interfacesKey` `tcpClient` entries) used the trapping `UInt16(_:)`
initializer to coerce JSON `Int` ports. If corrupted App Group data or
a future writer hands an out-of-range value to either path, the NE
process traps and the VPN terminates. Switch both call sites to
`UInt16(exactly:)` with an early-`continue` / failed-binding — same
behavior for legitimate 0…65535 ports, strictly safer for invalid
input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift
Coalesce LINKREQUEST notification retries into a single banner.

LXMF's `LXMRouter` retries DIRECT delivery up to `MAX_DELIVERY_ATTEMPTS = 5`
times spaced `DELIVERY_RETRY_WAIT = 10s` apart (`LXMF/LXMRouter.py:2654`).
Each retry constructs a fresh `RNS.Link(...)`, which on initiator
construction sends a new `LINKREQUEST` packet (`Reticulum/RNS/Link.py:308-324`).
`PacketTunnelProvider.maybeScheduleNotification` matches LINKREQUEST as
the DIRECT-flow signal that a new message is on its way, so without
coalescing a single undelivered DIRECT delivery produces 1–5 separate
"New message" banners on the lock screen.

Switch LINKREQUEST notifications to a static
`ext-linkreq-<destHashHex>` identifier so iOS replaces the prior
pending banner on each retry. `DATA`-path (OPPORTUNISTIC) notifications
keep their timestamp suffix because each represents an independently
delivered message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift
Comment thread Sources/ColumbaApp/Services/AppServices.swift
torlando-agent Bot and others added 3 commits May 13, 2026 23:51
Dedupe NE placeholder notifications when host app fires its rich
notification.

When the host app is background-running (not yet suspended), both
notification paths are live for the same arriving packet: the
extension's `ExtensionNotifications.postMessageArrived` posts a
generic "New message" banner keyed on the recipient's destination
hash, and the host app's `NotificationService.postMessageNotification`
posts a rich per-conversation banner keyed on the LXMF message hash.
Without dedupe the user sees two banners for one message.

Add `removeExtensionPlaceholders(forDestinationHashHex:)` and call it
just before adding the rich `UNNotificationRequest`. The helper fetches
pending + delivered notifications and removes any whose identifier
matches the two formats used by `PacketTunnelProvider.swift`:
  * `ext-<destHashHex>-<timestamp-ms>` (DATA / OPPORTUNISTIC)
  * `ext-linkreq-<destHashHex>`        (LINKREQUEST / DIRECT)

`UNUserNotificationCenter` only supports exact-match removal, so we
filter pending/delivered lists in-process by prefix and pass exact ids
to `removePendingNotificationRequests(withIdentifiers:)` /
`removeDeliveredNotifications(withIdentifiers:)`.

Also resolves out-of-scope thread already filed as #74 (multi-relay
tunnel mirror selection).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestURLHandler.bind installed TestRelayDelegate as the LXMRouter
delegate with originalDelegate: nil, displacing the production
IncomingMessageHandler that _initializeServicesOnce had set. The
router still persisted inbound messages to the DB, but
ensureConversation and the messageReceivedNotification UI refresh
never ran — so on debug builds (which always run bind) received
messages fired notifications but never showed in the chats list.

bind now takes the live IncomingMessageHandler and threads it
through as the relay's wrapped delegate. Verified on-device: a
fresh inbound-message stream now produces a conversation row with
correct display name + unread count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes ship together to keep the Network Extension actually
running once the user enables it, fixing the "tunnel goes
.disconnected and never recovers" pattern observed on-device.

1. On-demand always-connect rules in the VPN profile
   (TunnelManager.install + start). iOS now keeps the tunnel up
   across wake/sleep, network changes, and restarts it after the
   system tears it down under memory pressure. Existing profiles
   are migrated on next start(); disable() clears the rules so
   stopVPNTunnel() doesn't silently bounce back on.

2. Status-observer restart loop in AppServices
   (scheduleTunnelRestartIfNeeded). When the tunnel transitions to
   .disconnected after having been .connected (and the user's
   tunnelEnabledKey is still true), schedule a restart with
   doubling backoff (1s start, 300s cap). This is the belt to
   on-demand's suspenders — iOS doesn't always re-fire on-demand
   promptly. Gated on tunnelHasBeenConnectedOnce so the initial
   boot .disconnected firing doesn't race the auto-start path.

3. Don't auto-clear tunnelEnabledKey on transient launch failures.
   The previous auto-start cleared the pref on a 30s no-connect
   timeout, permanently disabling background transport on any
   transient blip — the empirical "tunnel dead for 10h" state was
   reproducibly caused by this. The restart loop now handles
   transient failures; only the user's explicit toggle-off clears
   the pref.

Verified on-device: cold launch from saved pref reaches .connected
with both interfaces (foreground + NE-owned) present in rnsd's
client list, backgrounding leaves the NE-owned connection intact
(only the foreground socket dies), and foregrounding restores the
dual state without any tunnel drop.

Does NOT yet solve "notifications fire during backgrounding" —
rnsd's path drift to the foreground socket still causes inbound
packets to be dropped while the app is suspended. Path management
is the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread Sources/ColumbaApp/Services/AppServices.swift
When the iOS app moves to .background, fire a tunnel-only re-announce
inside a UIApplication.beginBackgroundTask window so the last
announce rnsd receives is via the NE-owned socket. rnsd's path table
is single-path / last-write-wins (AppServices.swift:942), so this
pins the path to the still-alive NE socket before iOS tears the
foreground TCPInterface socket down — without this, the path stays
on the foreground socket, goes dead the moment we suspend, and rnsd
drops every inbound packet to our delivery destination.

Verified on-device with a controlled 50s background window: the NE
went from zero matches (path drift to dead foreground socket) to
four `[EXT/NOTIF] match` + `UN add ok` entries in the same interval
— two DIRECT LINKREQUEST matches for the phone's delivery dest plus
two OPPORTUNISTIC matches for a second local destination. Phase B
notifications now fire reliably while the app is suspended for the
duration of the RNS path TTL.

Sustained suspension (path TTL > foregrounded re-announce interval)
still needs NE-side periodic re-announce — a separate change that
requires the delivery identity in the App Group keychain.

Adds public AppServices.announceViaTunnel() wrapping the existing
private sendAnnounceViaTunnel — needed because the call site is
the .background scenePhase handler in ColumbaApp, which lives in
the App target and can't reach private AppServices methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant