fix(dash-spv): preserve in-progress sync state on peer disconnect#746
fix(dash-spv): preserve in-progress sync state on peer disconnect#746xdustinface wants to merge 2 commits intov0.42-devfrom
Conversation
`BlocksManager` and `FiltersManager` wiped their pipelines on every disconnect, which raced with the previous cycle's `BlockProcessed` events still in flight and left `pending_blocks` stuck above zero so `SyncComplete` never fired. Replace the trait's `clear_in_flight_state` with a single `on_disconnect` hook. Each manager only invalidates state tied to the dead peer. `BlocksManager` and `FiltersManager` requeue their in-flight `getdata` / `getcfilters` slots so the next `send_pending` reissues them on the new peer, and `start_sync` resumes the preserved batches. The wallet-rescan path in `FiltersManager::tick` calls a manager-internal `reset_for_rescan` for the case that genuinely wants a full wipe.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR renames the SyncManager disconnect hook from ChangesDisconnect Lifecycle Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v0.42-dev #746 +/- ##
=============================================
+ Coverage 71.35% 71.98% +0.63%
=============================================
Files 320 320
Lines 69370 69653 +283
=============================================
+ Hits 49497 50138 +641
+ Misses 19873 19515 -358
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
dash-spv/src/sync/download_coordinator.rs (1)
95-110: 💤 Low valueLGTM –
requeue_in_flightimplementation is correct.One cosmetic note:
items.into_iter().rev()reverses a collection whose order is already arbitrary (fromHashMap::drain), so the reversal has no meaningful effect on relative ordering among the requeued items. The tests correctly handle this by sorting the requeued slice before asserting content. If the intent was "push items to the front in the order they were originally enqueued," that ordering is not recoverable from thein_flightHashMap; the current code is functionally equivalent to iterating forward. This is a no-op cleanup at best.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dash-spv/src/sync/download_coordinator.rs` around lines 95 - 110, The call to items.into_iter().rev() in requeue_in_flight is pointless because items comes from draining a HashMap (in_flight) and has arbitrary order; remove the .rev() so you simply iterate items.into_iter() and push_front each into pending, or if you actually need to restore original enqueue order change in_flight from a HashMap to an ordered container (e.g., IndexMap or track a VecDeque of keys) and then requeue using that preserved order.dash-spv/src/sync/blocks/sync_manager.rs (1)
35-53: ⚡ Quick winResume path doesn't immediately reissue requeued
getdata— relies on nexttick.
on_disconnectrequeues in-flight slots topending, butstart_sync's non-empty-pipeline branch only flips state toSyncingwithout callingself.send_pending(_requests). Requeuedgetdatas sit idle until the 100ms tick fires. Compare withFiltersManager::start_sync(filters/sync_manager.rs Lines 60-63), which immediately callsfilter_pipeline.send_pending(...)on resume.The delay is small but the asymmetry between the two managers is easy to forget when debugging. Consider mirroring the filter-side pattern:
Proposed change
- // If in-progress block work survived a disconnect, resume in Syncing so - // `process_buffered_blocks` can transition to Synced once the pipeline - // drains. Otherwise wait for BlocksNeeded / FiltersSyncComplete events. - if !self.pipeline.is_complete() { - self.set_state(SyncState::Syncing); - } else { - self.set_state(SyncState::WaitForEvents); - } - Ok(vec![]) + // If in-progress block work survived a disconnect, resume in Syncing so + // `process_buffered_blocks` can transition to Synced once the pipeline + // drains. Otherwise wait for BlocksNeeded / FiltersSyncComplete events. + if !self.pipeline.is_complete() { + self.set_state(SyncState::Syncing); + // Reissue any requeued in-flight `getdata`s to the new peer immediately + // instead of waiting for the next tick. + if self.pipeline.has_pending_requests() { + self.send_pending(_requests).await?; + } + } else { + self.set_state(SyncState::WaitForEvents); + } + Ok(vec![])Note: the
_requestsparameter would also need to be renamed torequestssince it would now be used.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dash-spv/src/sync/blocks/sync_manager.rs` around lines 35 - 53, start_sync currently flips state to Syncing when self.pipeline is non-empty but does not reissue any requeued getdata, causing a delay until the next tick; update sync_manager::start_sync to accept and use the RequestSender (rename _requests to requests) and, in the branch where you call self.set_state(SyncState::Syncing), immediately invoke the manager's pending-send routine (analogous to FiltersManager::start_sync calling filter_pipeline.send_pending) — call the blocks pipeline/send_pending method (e.g., self.pipeline.send_pending or self.send_pending(requests)) right after setting the state so requeued in-flight slots from on_disconnect are resent without waiting for the tick.dash-spv/src/sync/sync_manager.rs (1)
116-134: 💤 Low valueTrait contract clarification looks good; lifecycle ordering is correct.
stop_syncsets state toWaitingForConnectionsfirst, then callson_disconnect, so any state-aware preservation logic in implementations sees the post-stop state. The new doc clearly distinguishes the "preserve durable, requeue peer-bound" intent. Note that this is a breaking trait change (no default impl), so any out-of-treeSyncManagerimplementations will need to be updated — worth calling out in release notes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dash-spv/src/sync/sync_manager.rs` around lines 116 - 134, The new trait method on_disconnect currently has no default implementation which is a breaking change; to avoid breaking out-of-tree SyncManager implementors, add a default no-op impl for on_disconnect in the SyncManager trait (so existing impls continue to compile) and keep stop_sync calling self.on_disconnect(); also update the release notes to mention the lifecycle ordering (stop_sync sets WaitingForConnections then calls on_disconnect) and the intended preservation/requeue semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dash-spv/src/sync/filters/pipeline.rs`:
- Around line 255-262: The requeue flow can leave a batch key in
coordinator.pending without a tracker if late filter messages complete the batch
in receive_with_data; add a method on DownloadCoordinator (e.g.,
cancel_pending(&mut self, batch_start: HeightType)) that removes a key from
pending, and update receive_with_data so that after calling
coordinator.receive(&batch_start) if it returns false you invoke
coordinator.cancel_pending(batch_start) and also remove the tracker from
batch_trackers to keep coordinator.pending and batch_trackers consistent (so
send_pending no longer errors with SyncError::InvalidState).
---
Nitpick comments:
In `@dash-spv/src/sync/blocks/sync_manager.rs`:
- Around line 35-53: start_sync currently flips state to Syncing when
self.pipeline is non-empty but does not reissue any requeued getdata, causing a
delay until the next tick; update sync_manager::start_sync to accept and use the
RequestSender (rename _requests to requests) and, in the branch where you call
self.set_state(SyncState::Syncing), immediately invoke the manager's
pending-send routine (analogous to FiltersManager::start_sync calling
filter_pipeline.send_pending) — call the blocks pipeline/send_pending method
(e.g., self.pipeline.send_pending or self.send_pending(requests)) right after
setting the state so requeued in-flight slots from on_disconnect are resent
without waiting for the tick.
In `@dash-spv/src/sync/download_coordinator.rs`:
- Around line 95-110: The call to items.into_iter().rev() in requeue_in_flight
is pointless because items comes from draining a HashMap (in_flight) and has
arbitrary order; remove the .rev() so you simply iterate items.into_iter() and
push_front each into pending, or if you actually need to restore original
enqueue order change in_flight from a HashMap to an ordered container (e.g.,
IndexMap or track a VecDeque of keys) and then requeue using that preserved
order.
In `@dash-spv/src/sync/sync_manager.rs`:
- Around line 116-134: The new trait method on_disconnect currently has no
default implementation which is a breaking change; to avoid breaking out-of-tree
SyncManager implementors, add a default no-op impl for on_disconnect in the
SyncManager trait (so existing impls continue to compile) and keep stop_sync
calling self.on_disconnect(); also update the release notes to mention the
lifecycle ordering (stop_sync sets WaitingForConnections then calls
on_disconnect) and the intended preservation/requeue semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fdedfab9-db3b-4890-8346-6d64c9f9f9b3
📒 Files selected for processing (15)
dash-spv/src/sync/block_headers/sync_manager.rsdash-spv/src/sync/blocks/manager.rsdash-spv/src/sync/blocks/pipeline.rsdash-spv/src/sync/blocks/sync_manager.rsdash-spv/src/sync/chainlock/sync_manager.rsdash-spv/src/sync/download_coordinator.rsdash-spv/src/sync/filter_headers/manager.rsdash-spv/src/sync/filter_headers/sync_manager.rsdash-spv/src/sync/filters/manager.rsdash-spv/src/sync/filters/pipeline.rsdash-spv/src/sync/filters/sync_manager.rsdash-spv/src/sync/instantsend/sync_manager.rsdash-spv/src/sync/masternodes/sync_manager.rsdash-spv/src/sync/mempool/sync_manager.rsdash-spv/src/sync/sync_manager.rs
…equeued batch After `requeue_in_flight()` moves a batch's `start_height` from `coordinator.in_flight` back to `coordinator.pending`, a buffered `CFilter` from the disconnected peer can still reach `receive_with_data` and complete the batch. The tracker is removed but `coordinator.receive` returns `false` (the key is in `pending`, not `in_flight`), leaving the key orphaned in `pending`. The next `send_pending` pops it, finds no tracker, and aborts sync with `SyncError::InvalidState`. Add `DownloadCoordinator::cancel_pending` and call it from `receive_with_data` when `receive` reports the key was not in flight. Addresses CodeRabbit review comment on PR #746 #746 (comment)
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
BlocksManagerandFiltersManagerwiped their pipelines on every disconnect, which raced with the previous cycle'sBlockProcessedevents still in flight and leftpending_blocksstuck above zero soSyncCompletenever fired.Replace the trait's
clear_in_flight_statewith a singleon_disconnecthook. Each manager only invalidates state tied to the dead peer.BlocksManagerandFiltersManagerrequeue their in-flightgetdata/getcfiltersslots so the nextsend_pendingreissues them on the new peer, andstart_syncresumes the preserved batches. The wallet-rescan path inFiltersManager::tickcalls a manager-internalreset_for_rescanfor the case that genuinely wants a full wipe.Summary by CodeRabbit
Refactor
Improvements
Tests