feat: allow blocking bad peers & enforce backoff on inbound syncs #745
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two peer filtering features to the Recon libp2p behaviour:
sync_delayafter failed inbound syncs, but it's only enforced on outbound syncs. The result is that a bad peer will initiate a new inbound sync after 1 second, causing pretty severe log spam.How it works
Peer blocking is most straightforward - added
blocked_peers: HashSet<PeerId>to Config, checked inhandle_established_inbound_connection/handle_established_outbound_connection. ReturnsConnectionDeniedwhich closes the connection after transport establishment but before protocol negotiation.Inbound rejection was trickier. The main challenge was that backoff state lived in the
peersmap which gets cleared on disconnect - so if a bad peer reconnected, the fresh handler wouldn't know to reject them.Solution: Added a separate
backoff_registry: BTreeMap<PeerId, Instant>that persists across disconnections. Handlers receive initial state via constructor, plusUpdateRejectUntilmessages for updates during the connection lifetime. OnFullyNegotiatedInbound, the handler checks if we're in backoff and drops the stream if so.Expired entries are lazily cleaned in
poll(). No need for timers since we don't care about exact timing that much.Tradeoffs
New metrics
blocked_connection_countinbound_sync_rejected_countTests
Honestly not too sure about these tests tbh, it's a tricky pattern to test for.
blocked_peer_connection_rejected- verifies connection is closed for blocked peersinbound_sync_rejected_during_backoff- verifies model syncs get skipped after backoff kicks in (third cycle with 100x multiplier)backoff_registry_expires- verifies syncs resume after short backoff expires