Skip to content

feat: multi-TCP tunnel — extension manages a connection per entity#62

Merged
torlando-tech merged 37 commits into
feat/enable-tunnel-flip-flagfrom
feat/multi-tcp-tunnel
May 11, 2026
Merged

feat: multi-TCP tunnel — extension manages a connection per entity#62
torlando-tech merged 37 commits into
feat/enable-tunnel-flip-flagfrom
feat/multi-tcp-tunnel

Conversation

@torlando-tech
Copy link
Copy Markdown
Owner

Summary

  • Network Extension previously kept a single tcpConnection and a single currentTCP endpoint, so enabling two TCP relays in the app silently dropped one — the JSON parser overwrote result.tcp on every iteration and only the last enabled tcpClient got a socket. Verified empirically: with two enabled, only one had inbound traffic.
  • Now both the IPC wire format and the extension's connection manager are per-entity. Each InterfaceEntity gets its own NWConnection, its own HDLC receive buffer, its own state-update handler. applyConfigsLocked diffs per-id so adding or editing one entry doesn't disturb the others.

What changed

Wire protocol (SharedFrameQueue + handleAppMessage)

  • Frame format gains a 1-byte idLen and a length-prefixed UTF-8 entity id between the interface tag and the frame payload.
  • TunnelManager.sendFrame accepts an entityId and writes it into the IPC envelope (sendProviderMessage).
  • ExtensionFrameReader.onTCPFrameReceived is now (entityId, data); the AppServices handler routes inbound frames to the matching TCPInterface by id, with safe fallbacks for empty/legacy ids.
  • Old format frames in flight at the upgrade are lost on first read; the queue is append-and-clear so the lifetime is short.

Extension (PacketTunnelProvider)

  • tcpConnection / tcpReceiveBuffer / currentTCP → per-entity dicts.
  • startTCPConnection(entityId:host:port:), receiveTCPData(entityId:), handleTCPData(entityId:data:) — all per-connection.
  • applyConfigsLocked diffs [entityId: (host, port)]: an entry whose endpoint is unchanged keeps its connection, removed entries tear down only their own socket, edited entries restart only that socket.
  • loadInterfaceConfigs returns tcps: [String: (host, port)] keyed by InterfaceEntity.id.
  • handleAppMessage parses the new wire format and looks up the connection by id, falling back to the sole connection when the id is empty (so a hypothetical legacy single-TCP build still routes correctly).
  • ExtensionDiagLog lifecycle events for (re)applying, removed, state, failed per entity — low-rate, useful for diagnosing without syslog access.

Verified on device

End-to-end RECV path tagged with the entity id of its source connection:
```
[EXT/TCP] rx [5D10953B-…]: 219B raw → 1 frame(s)
[EXT_RX] drained 1 frame(s) from queue
[RECV] type=announce dest=… from=5D10953B-…
[RECV_ANNOUNCE] fullDest=…
```

Test plan

  • One TCP enabled — works the same as before, no regression on existing single-TCP deployments
  • Enable a second TCP to a different relay — both connections come up, both deliver inbound, both [RECV] lines tagged with their own entity id
  • Toggle the second one off — only it tears down, original keeps running
  • Edit one's host/port — only that one reconnects, the other is untouched
  • Backgrounded with two TCPs — both keep delivering announces while the app is locked

Notes

This stacks on #57 (which establishes the tunnel-mode wiring). Base branch is feat/enable-tunnel-flip-flag until #57 lands; rebase onto main afterwards.

🤖 Generated with Claude Code

Toggling/editing any TCP interface in Interfaces settings was tearing
down every other healthy TCP connection alongside the one the user
actually changed. Each reconnect triggered the relay to redeliver its
full announce table, swamping the app for ~90s per change (90k+
announces in one minute, observed on rmap.world).

Two layers of fix:

1. `AppServices.connectTCPInterface(entityId:host:port:)` is now
   idempotent. It tracks the last-applied host:port per entity and
   returns immediately when called with the same endpoint as the
   currently-running interface. Calling it with a different endpoint
   still disconnects-and-recreates as before.

2. `InterfaceManagementViewModel.applyChanges` loops over every
   enabled TCP entity (not just the one that changed). It now skips
   entities whose endpoint hasn't moved, avoiding both the connect
   call AND the brief `.connecting` UI flicker.

Stop and shutdown paths clear the endpoint dictionary alongside
`tcpInterfaces` so a future re-add doesn't short-circuit against a
stale entry.

Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on
existence checks and don't trigger this. Config changes for those
types still don't take effect without a manual disable/re-enable —
separate issue, smaller blast radius, not addressed here.
Toggling/editing any TCP interface in Interfaces settings was tearing
down every other healthy TCP connection alongside the one the user
actually changed. Each reconnect triggered the relay to redeliver its
full announce table, swamping the app for ~90s per change (90k+
announces in one minute, observed on rmap.world).

Two layers of fix:

1. `AppServices.connectTCPInterface(entityId:host:port:)` is now
   idempotent. It tracks the last-applied host:port per entity and
   returns immediately when called with the same endpoint as the
   currently-running interface. Calling it with a different endpoint
   still disconnects-and-recreates as before.

2. `InterfaceManagementViewModel.applyChanges` loops over every
   enabled TCP entity (not just the one that changed). It now skips
   entities whose endpoint hasn't moved, avoiding both the connect
   call AND the brief `.connecting` UI flicker.

Stop and shutdown paths clear the endpoint dictionary alongside
`tcpInterfaces` so a future re-add doesn't short-circuit against a
stale entry.

Auto/BLE/RNode/Multipeer sections of `applyChanges` already gate on
existence checks and don't trigger this. Config changes for those
types still don't take effect without a manual disable/re-enable —
separate issue, smaller blast radius, not addressed here.
Previously the Network Extension kept a single `tcpConnection` and a
single `currentTCP` endpoint, so enabling two TCP relays in the app
silently dropped one — the extension's config loader overwrote
`result.tcp` on every iteration and only the last enabled tcpClient
in the JSON array got a socket. The other relay was unreachable
through the tunnel and inbound from the wrong relay was routed back
to whichever `TCPInterface` happened to be first in the app's
dictionary.

This commit lifts the entire tunnel TCP layer to per-entity:

- `SharedFrameQueue` frame format gains a 1-byte entityId-length
  field and a length-prefixed UTF-8 entity id between the interface
  tag and the frame payload. Old format frames in flight at the
  upgrade are lost on first read; the queue is append-and-clear
  so the lifetime is short.
- `TunnelManager.sendFrame` adds an `entityId` parameter and writes
  it into the IPC envelope sent via `sendProviderMessage`.
  `connectTCPInterface` and `applyTunnelModeToInterfaces` now
  capture the entity id in the per-interface tunnel-mode hook so
  outbound frames from each `TCPInterface` carry their own id.
- `ExtensionFrameReader.onTCPFrameReceived` is now `(entityId, data)`
  and the AppServices handler routes inbound frames to the matching
  `TCPInterface` by id, with safe fallbacks for empty/legacy ids.
- `PacketTunnelProvider` replaces `tcpConnection` /
  `tcpReceiveBuffer` / `currentTCP` with per-entity dicts. Each
  `NWConnection` has its own HDLC receive buffer (sharing one
  buffer between two streams would corrupt frame boundaries),
  its own state-update handler that only tears down its own entry,
  and its own `receiveTCPData` recursion so inbound frames are
  tagged with the right id when appended to the queue.
- `applyConfigsLocked` diffs per-entity: an entry whose endpoint is
  unchanged keeps its connection, a removed entry tears down only
  its own socket, an edited entry restarts only that socket. Adding
  a second relay no longer disturbs the first.
- `loadInterfaceConfigs` returns `tcps: [String: (host, port)]`
  keyed by `InterfaceEntity.id` instead of a single optional.

`handleAppMessage` parses the new wire format (entityId-length +
entityId in front of frame data) and looks up the connection by id,
falling back to the sole connection when the id is empty so a
hypothetical legacy single-TCP build still routes correctly.
Lifecycle events only — config (re)apply, config removal, state
transitions, failure. Per-frame and per-drain logging is omitted
to keep the file small. Per-entity tagging in the messages makes
multi-TCP behaviour observable without needing syslog access.

Used to diagnose the silent-inbound regression that turned out to
be the SharedFrameQueue wire-format roll-out interacting with a
not-yet-relaunched extension; left in place for future debugging.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR upgrades the TCP tunnel from a single shared connection to a per-entity model, fixing a bug where enabling two TCP relays silently dropped all but the last one. Both the shared-file wire format (SharedFrameQueue) and the extension's connection manager (PacketTunnelProvider) now key everything by InterfaceEntity.id, and AppServices gains an idempotency guard so toggling one interface doesn't reconnect unrelated ones.

  • Wire format updatedSharedFrameQueue frames and IPC envelopes gain a 1-byte idLen + length-prefixed entity-id field; TunnelManager.sendFrame and ExtensionFrameReader.onTCPFrameReceived are updated to carry and surface the id end-to-end.
  • Extension per-entity connection lifecycletcpConnection/tcpReceiveBuffer replaced by [String: NWConnection] and [String: Data] dicts; applyConfigsLocked diffs per entity, teardownTCPConnectionLocked tears down a single entity's connection, and teardownAllTCPConnectionsLocked handles stop/cancel.
  • App-side idempotencytcpEndpoints tracks the last-applied host:port per entity so re-applying the same config in a loop is a no-op; both connectTCPInterface and reconnectTCPOnly roll back on addInterface failure.

Confidence Score: 5/5

Safe to merge — the multi-TCP refactor is well-scoped and the previously-identified dictionary-mutation issues are addressed; the open tcpEndpoints write-ordering threads are pre-existing, not regressions from this PR.

The core structural change — replacing single-connection properties with keyed dictionaries in both the extension and the app layer — is clean and internally consistent. All new dictionary-iteration loops correctly snapshot keys before mutation, handleAppMessage has the proper two-step length guard, and the wire-format changes in SharedFrameQueue are symmetrical between writer and reader.

No files require special attention beyond the open tcpEndpoints write-ordering threads already tracked in prior reviews.

Important Files Changed

Filename Overview
Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Single-to-multi-TCP refactor: tcpConnection/tcpReceiveBuffer/currentTCP replaced by per-entity dicts; stale-id teardown and wake() iteration both correctly snapshot keys into an Array before mutating; handleAppMessage has proper >= 2 guard before accessing messageData[1].
Sources/Shared/SharedFrameQueue.swift Frame format extended with 1-byte entityId length + UTF-8 id payload; append and drain both correctly account for the new header size and totalLen semantics; guard conditions in drain handle the underflow/malformed-frame edge cases.
Sources/ColumbaApp/Services/AppServices.swift Adds tcpEndpoints idempotency dict, isTunnelModeActive flag, PeerChildInterfaceRegistry, and replaces setOnInterfaceAdded with setOnInterfaceConnected/setOnInterfacePeerSpawned; two tcpEndpoints write-before-addInterface issues remain open from prior review threads.
Sources/ColumbaApp/Services/TunnelManager.swift sendFrame gains entityId parameter with default = empty string for backward compat; wire encoding (tag + idLen + idBytes + data) matches the extension's parser exactly.
Sources/ColumbaApp/Services/ExtensionFrameReader.swift onTCPFrameReceived callback signature updated from (Data) to (String, Data); all call sites updated; entityId threading to AppServices correctly handles empty/legacy-id frames with a MainActor.run fallback chain.

Sequence Diagram

sequenceDiagram
    participant App as AppServices (MainActor)
    participant TM as TunnelManager
    participant EXT as PacketTunnelProvider (configQueue)
    participant SFQ as SharedFrameQueue (file)
    participant EFR as ExtensionFrameReader

    Note over App,EXT: Outbound (App → Extension → TCP relay)
    App->>TM: "sendFrame(data, tag=TCP, entityId)"
    TM->>EXT: handleAppMessage([tag][idLen][id][data])
    EXT->>EXT: look up tcpConnections[entityId]
    EXT->>EXT: NWConnection.send(frameData)

    Note over EXT,App: Inbound (TCP relay → Extension → App)
    EXT->>EXT: receiveTCPData(entityId)
    EXT->>EXT: handleTCPData(entityId, data)
    EXT->>SFQ: "append(frame, tag=TCP, entityId)"
    Note over SFQ: [4B len][1B tag][1B idLen][N id][M data]
    App->>SFQ: drainFrames()
    SFQ-->>EFR: QueuedFrame(tag, entityId, data)
    EFR->>App: onTCPFrameReceived(entityId, data)
    App->>App: look up tcpInterfaces[entityId]
    App->>App: transport.handleReceivedData(data, from: tcpId)
Loading

Reviews (5): Last reviewed commit: "fix(tunnel): guard applyTunnelModeToInte..." | Re-trigger Greptile

Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Outdated
Comment thread Sources/ColumbaNetworkExtension/PacketTunnelProvider.swift Outdated
torlando-agent Bot and others added 24 commits May 5, 2026 00:10
Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding
add-interface surface: server selection (bootstrap/community/custom) →
review & configure. Routes Settings → Network Interfaces → + → TCP Client
through the wizard instead of the blank manual entry sheet, and reroutes
edit-existing for TCP entries to the same flow with pre-filled values.

Scoped to the fields TCPClientConfig already supports (host, port,
networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred.

Closes #51

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>
* fix(MicronParser): persist formatting state across lines

The line-by-line parse loop hardcoded `currentStyle: .plain` on every
parseInline call, so a `Fxxx`Bxxx preamble line consumed its colors
into an empty span and the following ASCII art rendered with no fg/bg.
Match python NomadNet's MicronParser by promoting currentStyle to a
parser-loop local that threads through every parseInline call, with
parseInline returning the terminal style so the caller can carry it
forward. `< at line-start additionally resets currentStyle to .plain,
matching python's `<` semantics.

Repro: the index.mu at github.com/fr33n0w/thechatroom uses the
preamble shape `F0ff`B52f then ASCII art then `f`b — before this fix
the colors were silently dropped.

Closes #31

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(NodeDetailsView): allow tapping action buttons on stale-path contacts

Browse Site / Start Chat / Set as My Relay were `.disabled(!isOnline)`
on a contact's NodeDetailsView, where `isOnline` is just `Date() <
entry.expires` from the path table. After cleanupLinks runs `expirePath`
on a failed-link destination, the contact's path becomes "expired" until
a new announce arrives — but Reticulum's path discovery is exactly
designed for that case (issue a path-request, any peer with a recent
announce will respond). Greying the button blocks the user from the very
operation that would heal the path.

Drops the `.disabled` and `.opacity` modifiers from `actionButton(...)`
and the relay-toggle button. The underlying flow
(`NomadNetBrowserService.resolveValidPath`) already does
`pathTable.remove` + `transport.requestPath` + 10s poll, so taps now
flow through to the working recovery path.

Also reword the expired-hint copy from "Ask them to send an announce
from their app, or wait for one to arrive automatically" to "Tap an
action to issue a path request — any node on the network with a recent
announce will respond." — the original copy is wrong about how
Reticulum path discovery works and discourages users from doing the
right thing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(MicronDocumentView): render the chat-room ASCII art correctly

Three bugs surfaced once the parser carried `Bxxx background colors
forward across lines (faf17e4):

1. Centering broke against the document, not the screen. A wide row
   (e.g. fr33n0w/thechatroom's 550-char trailing-whitespace line)
   pushed the VStack out to ~4600pt; centered shorter rows landed
   at the middle of *that* width — way past the viewport. Fixed by
   capturing the actual screen viewport via GeometryReader in
   MonospaceScrollContainer (mirrors Android's
   `Modifier.widthIn(min = viewportLineWidth)` from
   NomadNetBrowserScreen.kt:474) and wrapping each scroll-mode row
   in `.frame(minWidth: viewportWidth, alignment: alignment.swiftUI)`.

2. Row-to-row column alignment drifted by half a cell because
   Core Text's `textAlignment = .center` strips trailing whitespace
   when computing the centered offset. Lines with a trailing space
   centered as if one cell narrower than lines without — visible as
   the letter "T" of "the chat room" wandering in the ASCII art.
   UILabel now always renders left-aligned (paragraphStyle and
   textAlignment) and visual centering is the SwiftUI .frame's job.

3. SF Mono renders Block-Elements (▗▄▖▝▀▘▙▟ etc.) at slightly
   different pixel widths than ASCII spaces, so 85-char rows of
   mixed content didn't end up the same width. Bundled JetBrains
   Mono (Apache 2.0/OFL, Regular + Bold, ~270KB each) for the
   monospace renderer — every glyph in the file has advance=600
   confirmed via fontTools, matching what Android already uses
   (MicronComposables.kt's `JetBrainsMonoFamily`). Falls back to
   the system font if the bundled one fails to load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com>
Addresses PR review comments:
#64 (comment)
#64 (comment)

Replace the iOS community-server directory with the canonical Android
list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt.
Removes decommissioned / non-existent entries (RNS Amsterdam, RNS
BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland,
TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the
servers that are actually present on the network. i2p is dropped
entirely because iOS has no i2p transport.

Also collapse the "Bootstrap Servers" / "Community Servers" split in
TCPClientWizard into a single "Community Servers" section, since
Reticulum-Swift does not yet implement bootstrap-interface mode and
splitting them would mislead users into expecting bootstrap behavior.
The isBootstrap flag on the data model is preserved so the Android
table stays mirrorable.

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>
Splits the auto-announce path into three independently-toggleable
triggers, all gated behind the existing `auto_announce_enabled` master:

  - `auto_announce_on_interval`       — periodic timer (existing)
  - `auto_announce_on_tcp_reconnect`  — fires on TCP / RNode reconnect
  - `auto_announce_on_peer_spawned`   — fires when AutoInterface / BLE /
                                        MPC accepts a new peer

All three default true to preserve the previous "all triggers active
when master is on" behaviour.

Wiring:
  - `AppServices.configureTransportCallbacks` now uses
    reticulum-swift's split callbacks (`setOnInterfaceConnected` /
    `setOnInterfacePeerSpawned`), each with its own user-setting gate.
    The polled state-observer's connect-trigger is gated to match.
  - `AutoAnnounceManager.start` (and the in-loop re-check) honour the
    `auto_announce_on_interval` toggle in addition to master.
  - `autoAnnounce()` itself bails on master-off as defense in depth.
  - SettingsView's Auto Announce card grows three sub-toggles +
    interval picker hides when the on-interval trigger is off.

Pairs with reticulum-swift's onInterfaceAdded → onInterfacePeerSpawned /
onInterfaceConnected split (see that repo). Ship-ready behaviour change
on its own; no diagnostic logging in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up the onInterfaceAdded → onInterfacePeerSpawned/onInterfaceConnected
split (reticulum-swift PR #14) that this PR's wiring requires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The polled state-observer's connect path was calling
`autoAnnounceManager.resetTimer()` unconditionally — even when the
TCP-reconnect gate had blocked the announce. Because `resetTimer()`
restarts the periodic loop with a fresh `Next auto-announce in 3h
(±1h)` schedule, every TCP reconnect on a flap-y network (mobile
data ↔ WiFi, RNode in poor RF) would push the next interval-announce
a full interval into the future without ever emitting one. The
periodic schedule could be perpetually starved even though the user
left "On interval" enabled and only disabled the reconnect trigger.

Move the `resetTimer()` call inside the gate so it only fires when an
announce actually went out.

Greptile review feedback on PR #70.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The auto-announce trigger gates were inlined as `defaults.bool(forKey: ...)`
calls at seven sites across AppServices and AutoAnnounceManager, which
made them impractical to unit-test without bringing up the full
AppServices stack (transport, identity, router, …).

Extract the gating decision into a pure value type, AutoAnnouncePolicy,
that snapshots the four UserDefaults keys and exposes:
  - shouldFireOnInterval
  - shouldFireOnTcpReconnect
  - shouldFireOnPeerSpawned

…all derived from the master enable plus the corresponding granular
toggle. Routes the seven existing call sites through the policy so the
inline string-key reads no longer appear in service code (which makes a
typo-rename harder and gives every gate the same code path).

Tests in AutoAnnouncePolicyTests cover:
  - Direct init stores all four flags.
  - Master off suppresses all three triggers regardless of granulars.
  - Each granular toggle gates its own trigger independently.
  - All-on / all-off boundary cases.
  - Empty defaults reports all-off (raw read behavior).
  - Snapshot is immutable after capture (catches future refactors that
    might keep a defaults reference).
  - register(defaults: true) produces the fresh-install all-fire baseline
    that SettingsViewModel.loadLocalSettings sets up.
  - Explicit false overrides registered default-true.

9 tests, all passing locally on iOS Simulator. Total suite went from
71 to 80 tests; no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wned gate

Reticulum-swift fires `onInterfacePeerSpawned` when an AutoInterface /
BLEInterface / MPCInterface accepts a peer, then a moment later fires
`onInterfaceConnected` for the peer's child transport's `.connected`
transition. The previous gating treated the second event as a generic
TCP-reconnect, so a user who turned the peer-spawned toggle off but
left tcp-reconnect on would still get an announce on every peer-add —
defeating the purpose of having a separate peer-spawned gate.

Changes:

  - `AutoAnnouncePolicy.shouldFireOnInterfaceConnected(isPeerChild:)`
    new accessor that gates by `onPeerSpawned` for peer-children and
    `onTcpReconnect` for everything else (both still subject to
    `masterEnabled`).
  - `AppServices` tracks ids passed through `onInterfacePeerSpawned` in
    a `peerChildInterfaceIds` set, then queries it in the
    `onInterfaceConnected` handler to pick the right gate.
  - Diagnostic log line distinguishes the two attribution paths so a
    future investigation can tell whether an announce came from the
    tcp-reconnect or peer-child-reconnect branch.

Tests cover the four corners of the cross-trigger matrix plus the
master-off override:

  - peer-child + peer-spawned-off + tcp-reconnect-on   → does NOT fire
  - peer-child + peer-spawned-on  + tcp-reconnect-off  → fires
  - non-peer-child + tcp-reconnect-on / off            → fires / not
  - master off                                         → never fires
  - all-on / all-off across peer-child boundaries

Greptile review feedback on PR #70 (4/5 confidence comment about peer-child overlap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The peer-spawned and connected callbacks fire from independent
reticulum-swift Tasks. The previous implementation used MainActor-
isolated record / lookup, which meant both operations had to await an
actor hop. Swift's task scheduler doesn't guarantee record-before-lookup
ordering between unrelated Tasks, so a fast peer-add → child-connect
sequence could in theory mis-attribute the connected event to
tcp-reconnect instead of peer-spawned (the user-facing bug
fixed in the prior commit).

Replace the MainActor-isolated Set with a synchronous, lock-protected
PeerChildInterfaceRegistry (OSAllocatedUnfairLock-backed). The peer-
spawned closure now records on its first line, *before* any await
suspension, so the record is committed before any subsequent
onInterfaceConnected for the same id can possibly run its attribution
lookup. The connected closure's lookup is also synchronous, so
attribution is correct regardless of how the schedulers interleave the
rest of the closure bodies.

Tests:
  - PeerChildInterfaceRegistryTests: empty / record-then-contains /
    idempotent / reset / immediate-visibility on same thread.
  - testConcurrentRecordAndContainsObservesAllPriorRecords: 1000-way
    concurrent record+contains stress, asserts no crash and full
    visibility after group completes.

Total suite: 90 tests, all passing.

Greptile review feedback on PR #70 (4/5 confidence comment about Task
ordering between MainActor hops).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-triggers

feat(auto-announce): granular trigger toggles + new wiring
Snapshot dictionary keys before mutating during iteration in
PacketTunnelProvider:

- applyConfigsLocked() stale-entry teardown: collect stale ids via
  filter() before the loop instead of iterating currentTCPs.keys
  while teardownTCPConnectionLocked + removeValue mutate it.
- wake() reaper: iterate Array(self.tcpConnections.keys) instead of
  the live Keys view while teardownTCPConnectionLocked mutates the
  same dictionary.

Both paths run on configQueue (the only mutator), but Swift's
Dictionary.Keys is documented as a live view and mutation during
iteration is undefined behavior — can silently skip entries or
crash. Both fixes are inert for the single-TCP case but matter as
soon as 2+ TCPs are active and a config-change or wake event fires.

Co-Authored-By: Claude opus-4-7-1m <noreply@anthropic.com>
Roll back tcpInterfaces[entityId] and defer tcpEndpoints[entityId] until
after transport.addInterface succeeds. Without this, a transient
addInterface throw left both dictionary entries populated for a dead,
un-attached interface; the next connectTCPInterface call with the same
endpoint hit the idempotency guard at the top of the function and
silently no-op'd, breaking self-healing reconnects until the user
manually edited host/port.

Greptile thread 2 (the matching skip in InterfaceManagementViewModel.
applyChanges) is satisfied by this same fix — once tcpEndpoints reflects
only successfully-applied endpoints, the VM's
`tcpEndpoints[id] == desired` guard correctly distinguishes "running
cleanly" from "stale dead entry waiting to retry".

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Extend the connectTCPInterface write-after-success + rollback pattern to
the three remaining tcp-server init sites: both initialize() overloads
and reinitializeConnection(). Without this, an addInterface throw during
init left tcpInterfaces["tcp-server"] and tcpEndpoints["tcp-server"]
populated with a dead interface; reconnectTCPOnly delegates to
connectTCPInterface(entityId: "tcp-server", ...) which then silently
no-op'd on a same-address retry through the new idempotency guard.

For the two initialize overloads, the catch block preserves the
"non-fatal" semantics (init proceeds without TCP, no rethrow) but now
also clears the partial dictionary writes so a later reconnectTCPOnly
retry isn't stuck. For reinitializeConnection — which had no catch and
propagates errors to its caller — the new do/catch rolls back and
rethrows, mirroring connectTCPInterface.

Co-Authored-By: Claude claude-opus-4-7[1m] <noreply@anthropic.com>
Picks the OpenFreeMap style URL (liberty / dark) based on
ThemeManager.isDarkMode and reapplies it from updateUIView when
the active scheme changes. Coordinator caches the last applied
URL to skip the no-op reassignment that would otherwise fire on
every peer-location tick.

Offline regions remain pinned to the liberty style at download
time; switching to dark while fully offline yields unstyled
tiles. To be addressed in a follow-up that caches both style
packs.

Closes #59

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…-mode-maps

feat(Map): follow app dark mode for OpenFreeMap style
fix: hot-swap TCP interfaces without disturbing the others
Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>
* feat(InterfaceManagement): add TCP client community-server wizard

Mirrors Android Columba's 2-step TCP client wizard at the post-onboarding
add-interface surface: server selection (bootstrap/community/custom) →
review & configure. Routes Settings → Network Interfaces → + → TCP Client
through the wizard instead of the blank manual entry sheet, and reroutes
edit-existing for TCP entries to the same flow with pre-filled values.

Scoped to the fields TCPClientConfig already supports (host, port,
networkName, passphrase). Bootstrap-only flag and SOCKS proxy are deferred.

Closes #51

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(TCPClientWizard): mirror android server list, drop bootstrap split

Addresses PR review comments:
#64 (comment)
#64 (comment)

Replace the iOS community-server directory with the canonical Android
list at app/src/main/java/network/columba/app/data/model/TcpCommunityServer.kt.
Removes decommissioned / non-existent entries (RNS Amsterdam, RNS
BetweenTheBorders, RNS Frankfurt, i2p Reticulum, Reticulum Ireland,
TheHub, Kosciuszko, Reticulum Ireland v2, RNS Roaming) and adds the
servers that are actually present on the network. i2p is dropped
entirely because iOS has no i2p transport.

Also collapse the "Bootstrap Servers" / "Community Servers" split in
TCPClientWizard into a single "Community Servers" section, since
Reticulum-Swift does not yet implement bootstrap-interface mode and
splitting them would mislead users into expecting bootstrap behavior.
The isBootstrap flag on the data model is preserved so the Android
table stays mirrorable.

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 4, rejected 0

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

* fix(TcpCommunityServer): remove unwanted servers from wizard list

The following entries should not be surfaced in the on-device wizard:

- interloper node + interloper node (Tor)
- Jon's Node
- Quortal TCP Node
- R-Net TCP
- RNS bnZ-NODE01, RNS COMSEC-RD, RNS HAM RADIO
- RNS Testnet StoppedCold
- RNS_Transport_US-East
- Tidudanka.com

Surviving list: 3 bootstrap-class (Beleth RNS Hub, Quad4 TCP Node 1,
FireZen) + 7 community (g00n.cloud Hub, noDNS1, noDNS2, NomadNode
SEAsia TCP, 0rbit-Net, Quad4 TCP Node 2, SparkN0de).

NOTE: the file's docstring claims this list mirrors Android's
`TcpCommunityServer.kt`. Pruning here breaks that mirror; a follow-up
PR should make the equivalent removal on the Android side, OR the
"keep in sync" claim should be relaxed to "originally derived from."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude claude-opus-4-7 <noreply@anthropic.com>
Co-authored-by: torlando-agent[bot] <torlando-agent@noreply.github.com>
* feat: add Maestro UI flows for columba-suite ui-screenshotter agent

Adds flows/ with 4 deterministic Maestro flows (contacts-list, chats-list,
settings, map) plus a README. The columba-suite ui-screenshotter agent
captures each flow at BASE_REF and HEAD in both light and dark Simulator
appearances on every UI-touching PR, linking the resulting PNG pair from
PLAN.md so reviewers see the visual change before merging.

This PR exists primarily to land flows/ on main so subsequent PRs have
flow coverage at BASE_REF. The screenshotter will fire on this PR itself,
but cleanly skip with screenshot_status: skipped_no_flows because the
PR's BASE_REF (this branch's parent) doesn't yet have flows/.

Voice-call flows are deferred — they need a debug-only lxma://debug/...
URL handler that doesn't exist yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(greptile): iteration 1 — applied 1, rejected 2

Co-Authored-By: Claude claude-opus-4-7 <noreply@anthropic.com>

---------

Co-authored-by: torlando-agent[bot] <217870594+torlando-agent[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: torlando-agent[bot] <281092095+torlando-agent[bot]@users.noreply.github.com>
…eline

Mirror of the Android `app/src/debug/.../TestController.kt` +
TestReceiver.kt surface, adapted to iOS via a sibling URL scheme
(`lxma-test://`) routed through the existing `.onOpenURL` handler in
ColumbaApp.swift. The 17 actions, log shape (`event=key=value`), and
whitespace-escape rules match Android byte-for-byte so the python
orchestrator's regexes work cross-platform.

- Sources/ColumbaApp/Test/TestController.swift — singleton coordinating
  the test-action surface; binds to live AppServices/router/interface
  repository, observes inbound LXMF + delivery-state via a relay
  delegate, emits structured os_log lines under subsystem
  `network.columba.app.test` / category `harness` so idevicesyslog
  filters cleanly.

- Sources/ColumbaApp/Test/TestURLHandler.swift — `lxma-test://<action>?<query>`
  dispatcher; mirrors Android's TestReceiver `when (action)` switch,
  routes to TestController. Wired into ColumbaApp.swift's `.onOpenURL`
  with a `#if DEBUG` guard.

- Both files are wrapped in `#if DEBUG` so they compile out of release
  `.ipa`s. Defense in depth: every entry trips an `assertionFailure`
  with a release-misconfig message. Verified empirically — release
  build's binary contains zero references to TestController /
  TestURLHandler / harness log strings.

- `lxma-test` URL scheme registered in Info.plist alongside `lxma`. The
  scheme stays present in release builds (no per-config plist on this
  project) but is harmless because no code in release handles it; the
  release `.onOpenURL` `#if DEBUG` block compiles to a guard-pass and
  the URL falls through.

The Python orchestrator at ~/.claude-runner/columba-harness/smoke_test_ios.py
drives this surface end-to-end (devicectl URL dispatch + idevicesyslog
tail) and is the iOS sibling of smoke_test.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs that prevented end-to-end smoke runs against a physical iPhone:

1. assertionFailure_releaseGuard() was calling assertionFailure(...)
   UNCONDITIONALLY in both TestController.swift and TestURLHandler.swift.
   That's exactly inverted from the intent — `assertionFailure` ALWAYS
   crashes in DEBUG builds. So every URL dispatch and every public
   handler entry crashed the app on the guard before any logic ran.

   Mirrors the Android side's `check(BuildConfig.DEBUG)` semantics:
   crash only when DEBUG is FALSE. New impl wraps the body in
   `#if !DEBUG ... #endif` so it's a no-op in normal debug builds and
   a hard crash if a release ever gets misconfigured to compile this
   file in.

2. TestLog.emit() now ALSO writes each line to
   `Documents/test_log.txt`, prefixed `seq=<n> ts=<iso8601>`. Reason:
   the Python orchestrator originally tailed device syslog via
   `idevicesyslog`, but iOS 17+ moved live-syslog behind the new
   CoreDevice / RemoteXPC tunnel that libimobiledevice can't speak.
   `pymobiledevice3` would work but needs a developer-tunnel daemon.
   The orchestrator now polls Documents/test_log.txt via
   `xcrun devicectl device copy from --domain-type appDataContainer`,
   which works out of the box and is more robust (no race window,
   survives disconnects). os_log writes are kept for human readers.

Verified end-to-end: smoke_test_ios.py runs the propagated_bidirectional
scenario all the way through interface setup, propagation-node config,
HAS_PATH=1, SEND_PROP, msg_sent. (Stalls at OUTBOUND-never-advances-to-
PROPAGATED — separate LXMFSwift outbound state-machine issue, NOT a
harness bug. Diagnostic for that lands in a follow-up.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
iOS 17+ moved live syslog behind the new CoreDevice / RemoteXPC
tunnel that libimobiledevice can't speak, so the smoke harness
couldn't observe library-internal events on the device. Added a
debug-only `dump_log` URL action that uses OSLogStore to extract
recent unified-log entries from the app process and forwards them
into Documents/test_log.txt as `lib_log subsys=… cat=… level=… msg=…`
lines that the orchestrator can parse with its existing devicectl
copy-from poll mechanism.

Filter defaults to `(com.columba.core, net.reticulum.lxmf)` ×
(Propagation, Sync, LXMRouter, Stamper, Identity, PropagationNodeManager)
to surface just the propagation-path observability we need to
diagnose stuck `state=OUTBOUND` failures. `?since=<sec>` sets the
window (default 120s); `?cat=<comma>` overrides categories; `?cat=*`
disables category filtering.

Critical first finding when wired up: processOutbound IS running and
calling sendPropagated; the failure is `LXMRouter` emitting
"Delivery failed: No path available to destination, retrying in 15s/120s"
because `pathTable.lookup(destinationHash: nodeHash)` returns nil for
the propagation node hash even though `pathTable.hasPath(for:)`
returns true on the same hash from the harness. Likely actor-
isolation race or stale-snapshot bug in the path-table view; needs
deeper investigation in LXMF-swift / reticulum-swift.

Sticks to existing test-surface contract — `lib_log_done count=<n>` /
`lib_log_err reason=<msg>` reply tokens; debug-only via the existing
`#if DEBUG` source-set isolation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bug-fix-and-instrument changes to make the PROPAGATED self-send
round-trip pass on iOS. Mirrors the Android smoke pipeline shipped in
PR #882.

1. TestRelayDelegate retention. LXMRouter holds the delegate weakly
   (LXMRouter.swift `weak var delegate`); attachDelegate handed in a
   stack-local relay that immediately deallocated, leaving the router
   with a nil delegate and no didUpdateMessage callbacks for outbound
   state changes. Pin the relay to TestController.attachedDelegate.

2. set_prop_node now goes through PropagationNodeManager.selectNode
   (via TestPathBridge.selectPropNode) instead of router.setOutboundPropagationNode.
   The manager is the only path that wires the announce-derived stamp
   cost into the router; the bare router setter left cost=0 and
   sendPropagated shipped a random stamp that lxmd rejected with
   ERROR_INVALID_STAMP. selectNode also now (a) reads stamp cost from
   pathTable.appData when knownNodes is empty and (b) waits up to ~5s
   for either source to populate, covering the smoke-test race where
   set_prop_node fires immediately after add_tcp_client (before the
   announce arrives).

3. PropagationNodeManager.processPathEntry re-applies the stamp cost
   to the router whenever an announce updates the currently-selected
   node, so a delayed announce can correct an earlier cost=0 setting.

Plus instrumentation: dump_log now emits each OSLog entry's actual
recorded timestamp (`entry_ts=`) alongside the dump-time `seq=N ts=`
prefix, and includes `network.columba.Columba` in the allowed-subsystem
set so app-side managers (PropagationNodeManager) show up.

Direct + opportunistic self-send scenarios are still WIP — they
require LXMRouter-level loopback for self-addressed packets (single
device can't actually transit a packet to itself through the network)
which is a future stage. PROPAGATED works today via the lxmd round-trip.
torlando-tech and others added 7 commits May 10, 2026 08:26
reticulum-swift @ d19919a — drops incorrect HEADER_2 conversion of link
DATA packets that broke multi-hop DIRECT delivery (state=SENT but the
echo bot never received the message). Mirrors python RNS/Transport.py
:1063, 1122-1130 — link DATA always sends HEADER_1 to the link's
attached_interface, never through path-table lookup.

LXMF-swift @ fe3ce84 (perf/stamper-parallel-primed-digest) — pins
reticulum-swift to the same fix branch.

Smoke results after fix (today's run #5):
  propagated_bidirectional: PASS (6.7s)
  direct_echo:              PASS (3.5s)  ← was FAIL pre-fix
  opp_echo:                 PASS (3.4s)
…roller

Spawned by TestController.bind() on first init; runs every 2s for the
app's lifetime, snapping the key window into Documents/screenshots/<seq>.png
and emitting:

  diag_tick seq=N state=<active|inactive|background> snapshot=<path|<skip>>
  lifecycle event=<did_become_active|will_resign_active|...>

Diagnoses the iOS smoke harness wedge: "lxma-test:// URLs stop reaching
the URL handler after 2-3 sequential runs." The ticker is driven by an
internal Task, NOT URL dispatch, so it keeps emitting even when URLs are
wedged. If ticks ALSO stop, the OS suspended/killed the app. If ticks
keep coming with state != .active, the app went background. If ticks
keep firing AND state stays .active but URLs still don't reach the
handler, the wedge is below SwiftUI (CoreDevice tunnel / launch
services). Last is the smoking gun pattern.

Field finding from this commit's first run (2026-05-10):
  iter 1: 3/3 PASS
  iter 2: 3/3 PASS
  iter 3: 0/3 FAIL — "TCP client interface ADD never confirmed"
  iter 4: total wedge — TestController never answered get_dest

After the wedge, even `devicectl device copy from` hangs for 30+s,
which proves the wedge is at the **CoreDevice tunnel layer**, not the
app's URL handler. The iPhone-side dev tunnel (RemoteServiceDiscovery)
goes degraded after rapid `process launch --payload-url` bursts.
Recovery: pkill devicectl + relaunch app via process launch (which
still works because process control rides a different RSD service).

Screenshots written to Documents/screenshots/, capped at 30 most-recent.
Pull via `xcrun devicectl device copy from --domain-type
appDataContainer --domain-identifier network.columba.Columba --source
Documents/screenshots --destination /tmp/...`.

#if DEBUG-only — does not ship in release, same as the rest of the
test surface.
LXMF-swift bump → b2e14cd: caps PROPAGATED outbound state at .sent
(per python LXMessage.py:568-578); large prop messages no longer
falsely advance to .delivered via the Resource path.

iOS UI:
- MessageBubble.deliveryStatusIcon: defensively coerce
  delivered/read → sent for any message with deliveryMethod ==
  'propagated' (handles stale rows from before the fix).
- MessageDetailView.statusCard: method-aware text for prop messages.
  'Sent' → 'Sent to relay' with subtitle explaining propagation
  nodes don't ack recipient receipt.

Diagnostic surface:
- New lxma-test://dump_db URL action. Walks the full
  conversations + messages tables, emits one line per row to
  test_log.txt. Diagnoses Tyler's 2026-05-10 observation that
  prop messages appear in a separate conversation from
  direct/opp — DB inspection is the source of truth (UI
  faithfully renders whatever conversations table has).

Refs:
- LXMF/LXMessage.py:568-578 (__mark_propagated → state=SENT)
- LXMF-swift b2e14cd (resource-handler split, port-aligned)
LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged):
  - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed
    SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone.
  - PROPAGATED state machine fixes: drops wrong link.identify(); wires
    RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler
    via pendingPropagationSends FIFO + pendingPropagationRejections
    set; handlePropagationAccepted + handleOutboundResourceFailed with
    awaited DB writes that preserve deliveryAttempts budget.
  - DIRECT path: self-send identity resolution before path table;
    drops premature link.identify(); broadcast-relay-only self-echo
    gate; DIRECT resource crash-recovery parity with PROPAGATED.
  - Stamp-rejected resource short-circuit prevents retry-loop spam.

reticulum-swift 0.3.0 (PR #16):
  - HEADER_2 link DATA conversion fix.
  - sendLinkData signature: destinationHash param removed (breaking).

Package.swift, pbxproj, and Xcode-shared Package.resolved all updated.
Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO,
BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional
with Mac echo bot) to follow on PR ready→draft transition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LXMF-swift 0.4.0 (PR #7 — perf/stamper-parallel-primed-digest, merged):
  - Parallel stamp generation (LXStamper TaskGroup, 8 workers, primed
    SHA256 digest) — cost=16 from multi-minute to ~1-2s on iPhone.
  - PROPAGATED state machine fixes: drops wrong link.identify(); wires
    RESOURCE_PRF to .sent (not .delivered); ERROR_INVALID_STAMP handler
    via pendingPropagationSends FIFO + pendingPropagationRejections
    set; handlePropagationAccepted + handleOutboundResourceFailed with
    awaited DB writes that preserve deliveryAttempts budget.
  - DIRECT path: self-send identity resolution before path table;
    drops premature link.identify(); broadcast-relay-only self-echo
    gate; DIRECT resource crash-recovery parity with PROPAGATED.
  - Stamp-rejected resource short-circuit prevents retry-loop spam.

reticulum-swift 0.3.0 (PR #16):
  - HEADER_2 link DATA conversion fix.
  - sendLinkData signature: destinationHash param removed (breaking).

Package.swift, pbxproj, and Xcode-shared Package.resolved all updated.
Build verified: xcodebuild for iOS Simulator, CODE_SIGNING_ALLOWED=NO,
BUILD SUCCEEDED. Smoke pipeline (PROPAGATED/DIRECT/OPP bidirectional
with Mac echo bot) to follow on PR ready→draft transition.

Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	Columba.xcodeproj/project.pbxproj
#	Sources/ColumbaApp/Services/AppServices.swift
Comment thread Sources/ColumbaApp/Services/AppServices.swift
# Conflicts:
#	Columba.xcodeproj/project.pbxproj
#	Sources/ColumbaApp/Models/TcpCommunityServer.swift
Comment on lines +494 to +499
tcpEndpoints["tcp-server"] = TCPEndpoint(host: host, port: port)
try await newTransport.addInterface(newInterface)
// Record the applied endpoint only after the interface
// has been successfully attached. See the matching catch
// block below for why this ordering matters.
tcpEndpoints["tcp-server"] = TCPEndpoint(host: host, port: port)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Premature tcpEndpoints write contradicts its own comment

tcpEndpoints["tcp-server"] is written at line 494 before try await newTransport.addInterface(newInterface), then written again at line 499 after it succeeds. The comment at lines 496-498 says "Record the applied endpoint only after the interface has been successfully attached" — but line 494 already records it. At the await suspension point another @MainActor task (e.g. a ViewModel calling connectTCPInterface("tcp-server", sameHost, samePort)) could observe both tcpInterfaces["tcp-server"] != nil and tcpEndpoints["tcp-server"] == endpoint and silently return early from the idempotency guard, reporting success before addInterface has even completed. The pattern used correctly in connectTCPInterface is to write tcpEndpoints only after the do block succeeds — this init path should match.

Prompt To Fix With AI
This is a comment left during a code review.
Path: Sources/ColumbaApp/Services/AppServices.swift
Line: 494-499

Comment:
**Premature `tcpEndpoints` write contradicts its own comment**

`tcpEndpoints["tcp-server"]` is written at line 494 *before* `try await newTransport.addInterface(newInterface)`, then written again at line 499 after it succeeds. The comment at lines 496-498 says "Record the applied endpoint only after the interface has been successfully attached" — but line 494 already records it. At the `await` suspension point another `@MainActor` task (e.g. a ViewModel calling `connectTCPInterface("tcp-server", sameHost, samePort)`) could observe both `tcpInterfaces["tcp-server"] != nil` and `tcpEndpoints["tcp-server"] == endpoint` and silently return early from the idempotency guard, reporting success before `addInterface` has even completed. The pattern used correctly in `connectTCPInterface` is to write `tcpEndpoints` only after the `do` block succeeds — this init path should match.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

…initial .invalid VPN state

iOS emits an `.invalid` / `.disconnected` VPN status notification on
every cold start — fired by `TunnelManager.onStatusChange` regardless
of whether the user has enabled Background Transport, because the
session machinery probes whatever is currently loaded. The previous
code unconditionally scheduled `applyTunnelModeToInterfaces(active:
false)` via the 5s debounce, which iterated every TCPInterface and
called `endTunnelMode()`.

`endTunnelMode()` in reticulum-swift 0.3.0 is NOT idempotent
(TCPInterface.swift:257-269): it unconditionally tears down the
working NWConnection (via `transport?.disconnect()` -> nil) and
re-runs `setupTransport()`. Calling it on an interface that was never
in tunnel mode (outboundHook == nil) is destructive — it kills the
live socket Step 7 brought up moments earlier.

Reproduced 2026-05-11 on smoke run iter1 against
`feat/multi-tcp-tunnel @ 0f7cf3e`: all 4 scenarios FAILED at the
earliest `send_*` step. has_path returned 1 for both PN and bot
(path table populated via inbound announces), but outbound sends
never advanced past `state=OUTBOUND`. Console showed `[TUNNEL]
disabled tunnel mode` ~5s after cold start with no prior
`[TUNNEL] enabled` line, confirming the debounce was tearing down
TCP without ever having activated it.

Fix tracks an `isTunnelModeActive` bool. The active=false branch
guards on it and returns early if tunnel mode was never activated.
Mirrors the "undo what you did" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@torlando-tech
Copy link
Copy Markdown
Owner Author

Smoke status: 4/4 PASS (smoke_clean) — backgrounded delivery gate met

Rebased on current main (LXMF-swift 0.4.0 + reticulum-swift 0.3.0 + phone-harness Stage 1 surface), tested against the iPhone via the smoke-test-ios harness over 3 iterations.

Iter 1 (HEAD 0f7cf3e, post-merge): all 4 scenarios FAIL — every send stuck at state=OUTBOUND. Diagnosis: applyTunnelModeToInterfaces(active: false) fired on the cold-start .invalid VPN status notification (which iOS emits regardless of user preference), and endTunnelMode() on reticulum-swift 0.3.0's TCPInterface is NOT idempotent — it unconditionally tears down the working NWConnection and re-runs setupTransport(). Net: every TCPInterface's live socket got killed ~5s after Step 7 brought it up.

Iter 2 (HEAD c0d2213, after the fix): 3/4 PASS. The fix adds an isTunnelModeActive bool and guards the disable path; [TUNNEL] skipping disable — tunnel mode was never active confirmed firing on cold start. propagated_echo failed on cold-link timing (gave up after sync_prop attempt 1); same path succeeded 13s later in backgrounded_propagated. Confirmed flake, not regression.

Iter 3 (same HEAD c0d2213): 4/4 PASS.

scenario iter 3 duration
propagated_echo PASS 21.3s (sync_attempts=2)
direct_echo PASS 4.3s
opp_echo PASS 3.9s
backgrounded_propagated PASS 81.8s

The new backgrounded_propagated scenario terminates the app mid-flight, waits 60s while the Mac echo bot picks up the queued message and echoes back to the PN, then restarts the app and asserts sync_prop retrieves the echo. Upload-to-PN landed pre-kill (lxmd_messagestore grew 0→1), app cleanly restarted, echo retrieved on 1st sync_prop attempt post-resume.

This branch now passes the Phase 3 (backgrounded delivery) smoke gate — task #81 in the parent roadmap.

🤖 Generated with Claude Code

@torlando-tech torlando-tech merged commit 79beb50 into feat/enable-tunnel-flip-flag May 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant