Skip to content

Conversation

@m0ar
Copy link
Collaborator

@m0ar m0ar commented Nov 21, 2025

⚠️ WIP - early feedback requested

This PR adds two peer filtering features to the Recon libp2p behaviour:

  1. Permanent peer blocking - Operators can configure a set of PeerIds to block entirely, if they are consistently syncing bad data
  2. Inbound sync rejection during backoff - When we're backing off from a peer due to failures, we now also reject their incoming sync attempts. Something we keep seeing is our node correctly sets a sync_delay after failed inbound syncs, but it's only enforced on outbound syncs. The result is that a bad peer will initiate a new inbound sync after 1 second, causing pretty severe log spam.

How it works

Peer blocking is most straightforward - added blocked_peers: HashSet<PeerId> to Config, checked in handle_established_inbound_connection/handle_established_outbound_connection. Returns ConnectionDenied which closes the connection after transport establishment but before protocol negotiation.

Inbound rejection was trickier. The main challenge was that backoff state lived in the peers map which gets cleared on disconnect - so if a bad peer reconnected, the fresh handler wouldn't know to reject them.

Solution: Added a separate backoff_registry: BTreeMap<PeerId, Instant> that persists across disconnections. Handlers receive initial state via constructor, plus UpdateRejectUntil messages for updates during the connection lifetime. On FullyNegotiatedInbound, the handler checks if we're in backoff and drops the stream if so.

Expired entries are lazily cleaned in poll(). No need for timers since we don't care about exact timing that much.

Tradeoffs

  • No protocol changes - We just drop the inbound stream, so remote peers see a normal failure. They don't need updates.
  • Handler constructor + message pattern
    • Could have done message-only, but that has a race condition window on reconnection (i.e., we might start a sync before the handler is informed that the peer should be blocked)
    • Constructor-only wouldn't handle backoff changes during a connection, and we don't disconnect a peer after a failed sync. Combined approach covers both needs, I think.
  • Separate registry vs reusing peers map - The peers map has other lifecycle expectations and gets cleared on disconnect. Separate registry is cleaner even if slightly redundant. It'd probably be possible to use the backoff map for both in and outbound syncs, but that refactor seemed a bit riskier.

New metrics

  • blocked_connection_count
  • inbound_sync_rejected_count

Tests

Honestly not too sure about these tests tbh, it's a tricky pattern to test for.

  • blocked_peer_connection_rejected - verifies connection is closed for blocked peers
  • inbound_sync_rejected_during_backoff - verifies model syncs get skipped after backoff kicks in (third cycle with 100x multiplier)
  • backoff_registry_expires - verifies syncs resume after short backoff expires

@m0ar m0ar temporarily deployed to tnet-prod-2024 November 21, 2025 15:26 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants