feat: allow blocking bad peers & enforce backoff on inbound syncs #745

m0ar · 2025-11-21T15:05:07Z

⚠️ WIP - early feedback requested

This PR adds two peer filtering features to the Recon libp2p behaviour:

Permanent peer blocking - Operators can configure a set of PeerIds to block entirely, if they are consistently syncing bad data
Inbound sync rejection during backoff - When we're backing off from a peer due to failures, we now also reject their incoming sync attempts. Something we keep seeing is our node correctly sets a sync_delay after failed inbound syncs, but it's only enforced on outbound syncs. The result is that a bad peer will initiate a new inbound sync after 1 second, causing pretty severe log spam.

How it works

Peer blocking is most straightforward - added blocked_peers: HashSet<PeerId> to Config, checked in handle_established_inbound_connection/handle_established_outbound_connection. Returns ConnectionDenied which closes the connection after transport establishment but before protocol negotiation.

Inbound rejection was trickier. The main challenge was that backoff state lived in the peers map which gets cleared on disconnect - so if a bad peer reconnected, the fresh handler wouldn't know to reject them.

Solution: Added a separate backoff_registry: BTreeMap<PeerId, Instant> that persists across disconnections. Handlers receive initial state via constructor, plus UpdateRejectUntil messages for updates during the connection lifetime. On FullyNegotiatedInbound, the handler checks if we're in backoff and drops the stream if so.

Expired entries are lazily cleaned in poll(). No need for timers since we don't care about exact timing that much.

Tradeoffs

No protocol changes - We just drop the inbound stream, so remote peers see a normal failure. They don't need updates.
Handler constructor + message pattern
- Could have done message-only, but that has a race condition window on reconnection (i.e., we might start a sync before the handler is informed that the peer should be blocked)
- Constructor-only wouldn't handle backoff changes during a connection, and we don't disconnect a peer after a failed sync. Combined approach covers both needs, I think.
Separate registry vs reusing peers map - The peers map has other lifecycle expectations and gets cleared on disconnect. Separate registry is cleaner even if slightly redundant. It'd probably be possible to use the backoff map for both in and outbound syncs, but that refactor seemed a bit riskier.

New metrics

blocked_connection_count
inbound_sync_rejected_count

Tests

Honestly not too sure about these tests tbh, it's a tricky pattern to test for.

blocked_peer_connection_rejected - verifies connection is closed for blocked peers
inbound_sync_rejected_during_backoff - verifies model syncs get skipped after backoff kicks in (third cycle with 100x multiplier)
backoff_registry_expires - verifies syncs resume after short backoff expires

feat: allow blocking bad peers, enforce backoff on inbound syncs

0e1ddbe

m0ar temporarily deployed to tnet-prod-2024 November 21, 2025 15:26 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: allow blocking bad peers & enforce backoff on inbound syncs #745

feat: allow blocking bad peers & enforce backoff on inbound syncs #745

Uh oh!

m0ar commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: allow blocking bad peers & enforce backoff on inbound syncs #745

Are you sure you want to change the base?

feat: allow blocking bad peers & enforce backoff on inbound syncs #745

Uh oh!

Conversation

m0ar commented Nov 21, 2025

⚠️ WIP - early feedback requested

How it works

Tradeoffs

New metrics

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants