Skip to content

peer: rotate samizdat credentials hourly (closes engineering#3437)#472

Open
myleshorton wants to merge 1 commit into
fisk/peer-localbackendfrom
fisk/peer-cred-rotation
Open

peer: rotate samizdat credentials hourly (closes engineering#3437)#472
myleshorton wants to merge 1 commit into
fisk/peer-localbackendfrom
fisk/peer-cred-rotation

Conversation

@myleshorton
Copy link
Copy Markdown
Contributor

Summary

Closes the C2 finding from the Share My Connection security review (engineering#3437): peer.Client builds the libbox inbound exactly once per Start and holds the same samizdat creds for the entire peer process lifetime, so a leaked credential remains usable for hours or days.

This adds a `credRotationLoop` goroutine on a 1h tick that re-registers, rebuilds the libbox inbound with fresh creds, and deregisters the prior route. Caps blast radius from credential leakage to ~1h regardless of how long the peer has been running.

Sequence per rotation

  1. Re-register with lantern-cloud against the same (address, port) tuple — same router-side mapping, fresh server-side row, fresh samizdat creds
  2. Patch the new options for VPN bypass
  3. Build a new libbox service
  4. Close the old box (releases the listening port)
  5. Start the new box (re-binds the same port with new creds)
  6. Atomic swap of `c.box`, `c.routeID`
  7. Best-effort deregister of the prior route_id so the bandit stops handing the old (now-invalid) creds to clients within ~immediately rather than waiting up-to-TTL for the row to expire

Steps 4-5 leave a brief (~hundreds of ms) window where the port is unbound; samizdat clients see TCP RST and reconnect via the bandit. Acceptable trade-off vs. the security cost of indefinite cred lifetime.

Failure handling

Rotation is best-effort. A single failure (transient register error, network blip) logs and waits for the next tick. The current box and creds remain serving in the failure case so a one-off transient doesn't kill the session.

Test plan

  • `TestClient_RotatesCredentialsAtInterval`: drives a 50ms rotation interval and asserts ≥3 registers within 1s, deregister count tracks rotations, route_id advances past the initial value, multiple distinct boxes built, first box closed
  • All existing peer tests pass under `-race`
  • CI

Related

Part of the Share My Connection security review series:

  • ✅ engineering#3436 / lantern-cloud#2705 — egress filter (C1)
  • ✅ engineering#3437 (this PR) — cred rotation (C2)
  • 🔜 engineering#3438 — abuse-feedback path (C3)
  • 🔜 H1-H4 + M3 — follow-ups

🤖 Generated with Claude Code

Currently peer.Client builds the libbox inbound exactly once per Start
and holds the same X25519 keypair / shortID / masquerade for the
entire peer process lifetime — a leaked credential (logs, telemetry,
support bundles, the route_id leakage in engineering#3440) remains
usable for hours or days.

This adds a credRotationLoop goroutine on a 1h tick. On each tick:

  1. Re-register with lantern-cloud against the same (address, port)
     tuple — same router-side mapping, fresh server-side row, fresh
     samizdat creds.
  2. Patch the new options for VPN bypass.
  3. Build a new libbox service.
  4. Close the old box (releases the listening port).
  5. Start the new box (re-binds the same port with new creds).
  6. Atomic swap of c.box, c.routeID.
  7. Best-effort deregister of the prior route_id so the bandit
     stops handing the old (now-invalid) creds to clients within
     ~immediately rather than waiting up-to-TTL for the row to
     expire.

Steps 4-5 leave a brief (~hundreds of ms) window where the port is
unbound; samizdat clients see TCP RST and reconnect via the bandit.
That's the trade-off vs. the security cost of holding the same cred
for the peer process lifetime — caps blast radius from cred leakage
to ~1h regardless of how long the peer has been running.

Rotation is best-effort: a single failure logs and waits for the
next tick. The current box and creds remain serving in the failure
case so a transient register error doesn't kill the session.

Config gains CredRotationInterval (defaults to peerCredRotationInterval
= 1h) so tests drive the loop without a 1h sleep — see
TestClient_RotatesCredentialsAtInterval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 8, 2026 02:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds periodic rotation of peer “Share My Connection” samizdat credentials by re-registering with lantern-cloud, rebuilding the libbox inbound with fresh server config, swapping the active route/box, and best-effort deregistering the prior route to reduce the impact window of leaked credentials.

Changes:

  • Introduces Config.CredRotationInterval and a credRotationLoop() goroutine that calls rotateCreds() on a schedule (default: 1 hour).
  • Persists port mapping details and run context on the Client so rotations can re-register against the same (external IP, port) tuple and rebuild libbox.
  • Adds TestClient_RotatesCredentialsAtInterval to verify repeated register/build/close behavior over a short interval.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
peer/peer.go Adds credential rotation loop, stores mapping/run context for reuse, and implements re-register → rebuild → swap → deregister sequence.
peer/peer_test.go Extends stub server to vary register responses per call and adds a rotation-interval test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread peer/peer.go
// channel: heartbeatLoop's done already gates Stop, and rotation
// failures are non-fatal (log + retry next tick), so there's nothing
// the Stop path needs to wait on from this goroutine.
func (c *Client) credRotationLoop(ctx context.Context, interval time.Duration) {
Comment thread peer/peer.go
Comment on lines +531 to +536
c.mu.Lock()
c.box = newBox
c.routeID = regResp.RouteID
c.boxOptions = options
c.status.RouteID = regResp.RouteID
c.mu.Unlock()
Comment thread peer/peer.go
Comment on lines +538 to +545
// Deregister the prior route so the bandit stops handing the old
// (now-invalid) creds to clients. Best-effort: the prior row will
// expire from its TTL anyway, but explicit deregister cuts the
// stale-creds window from up-to-TTL down to ~immediately.
if err := c.cfg.API.Deregister(ctx, oldRouteID); err != nil {
slog.Warn("deregister prior route after rotation",
"err", err, "old_route_id", oldRouteID)
}
Comment thread peer/peer.go
c.box = newBox
c.routeID = regResp.RouteID
c.boxOptions = options
c.status.RouteID = regResp.RouteID
Comment thread peer/peer.go
Comment on lines +490 to +516
regResp, err := c.cfg.API.Register(ctx, RegisterRequest{
ExternalIP: externalIP,
ExternalPort: extPort,
InternalPort: intPort,
})
if err != nil {
return fmt.Errorf("re-register: %w", err)
}
options, err := ensurePeerOutboundsBypassVPN(regResp.ServerConfig)
if err != nil {
return fmt.Errorf("patch sing-box options: %w", err)
}

c.mu.Lock()
runCtx := c.runCtx
c.mu.Unlock()
if runCtx == nil {
// Stop happened between the unlock above and here. Skip the
// build to avoid spinning up a libbox tied to a torn-down ctx.
// The new register row is harmless — server-side reaper will
// deprecate it after TTL since no heartbeat will arrive.
return errors.New("client stopped during rotation")
}
newBox, err := c.cfg.BuildBoxService(runCtx, options)
if err != nil {
return fmt.Errorf("build new sing-box: %w", err)
}
Comment thread peer/peer.go
Comment on lines +519 to +528
// If newBox.Start fails after oldBox.Close, we lost the listener
// and the next heartbeat / rotation tick is the recovery point.
if closeErr := oldBox.Close(); closeErr != nil {
slog.Warn("close old box during rotation", "err", closeErr)
}
if err := newBox.Start(); err != nil {
// Catastrophic: port is now unbound. Leave c.box pointing at
// oldBox so a future Stop tries to close it (idempotent on
// already-closed); the next rotation tick will try again.
return fmt.Errorf("start new sing-box: %w", err)
Comment thread peer/peer.go
Comment on lines +521 to +529
if closeErr := oldBox.Close(); closeErr != nil {
slog.Warn("close old box during rotation", "err", closeErr)
}
if err := newBox.Start(); err != nil {
// Catastrophic: port is now unbound. Leave c.box pointing at
// oldBox so a future Stop tries to close it (idempotent on
// already-closed); the next rotation tick will try again.
return fmt.Errorf("start new sing-box: %w", err)
}
Comment thread peer/peer_test.go
// Each rotation deregisters the prior route — N rotations =>
// N deregisters (initial register is not preceded by one).
rotations := srv.registerCount.Load() - 1
assert.GreaterOrEqual(t, srv.deregisterCount.Load(), rotations-1,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants