Skip to content

fix(dash-spv): preserve in-progress sync state on peer disconnect#746

Open
xdustinface wants to merge 2 commits intov0.42-devfrom
fix/blocks-filters-disconnect-race
Open

fix(dash-spv): preserve in-progress sync state on peer disconnect#746
xdustinface wants to merge 2 commits intov0.42-devfrom
fix/blocks-filters-disconnect-race

Conversation

@xdustinface
Copy link
Copy Markdown
Collaborator

@xdustinface xdustinface commented May 8, 2026

BlocksManager and FiltersManager wiped their pipelines on every disconnect, which raced with the previous cycle's BlockProcessed events still in flight and left pending_blocks stuck above zero so SyncComplete never fired.

Replace the trait's clear_in_flight_state with a single on_disconnect hook. Each manager only invalidates state tied to the dead peer. BlocksManager and FiltersManager requeue their in-flight getdata / getcfilters slots so the next send_pending reissues them on the new peer, and start_sync resumes the preserved batches. The wallet-rescan path in FiltersManager::tick calls a manager-internal reset_for_rescan for the case that genuinely wants a full wipe.

Summary by CodeRabbit

  • Refactor

    • Renamed the disconnect lifecycle hook across sync managers for clearer lifecycle handling.
  • Improvements

    • Download/filter/block sync now preserves in-progress work across peer disconnects and requeues in-flight requests for the next peer.
    • Introduced a distinct wallet rescan reset that fully clears rescan state without affecting disconnect-preserved progress.
    • Mempool pending state is cleared on disconnect to avoid stale requests.
  • Tests

    • Added unit tests verifying requeue, cancel-pending, and disconnect-preserve behaviors.

`BlocksManager` and `FiltersManager` wiped their pipelines on every disconnect, which raced with the previous cycle's `BlockProcessed` events still in flight and left `pending_blocks` stuck above zero so `SyncComplete` never fired.

Replace the trait's `clear_in_flight_state` with a single `on_disconnect` hook. Each manager only invalidates state tied to the dead peer. `BlocksManager` and `FiltersManager` requeue their in-flight `getdata` / `getcfilters` slots so the next `send_pending` reissues them on the new peer, and `start_sync` resumes the preserved batches. The wallet-rescan path in `FiltersManager::tick` calls a manager-internal `reset_for_rescan` for the case that genuinely wants a full wipe.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d61b0df8-9127-41a6-abb1-4a2e63f46ab6

📥 Commits

Reviewing files that changed from the base of the PR and between b628a1f and d61355e.

📒 Files selected for processing (2)
  • dash-spv/src/sync/download_coordinator.rs
  • dash-spv/src/sync/filters/pipeline.rs

📝 Walkthrough

Walkthrough

The PR renames the SyncManager disconnect hook from clear_in_flight_state to on_disconnect, updates the trait and tests, adds DownloadCoordinator requeue/cancel utilities, and changes BlocksManager and FiltersManager to preserve and requeue in-flight work across disconnects instead of wiping it.

Changes

Disconnect Lifecycle Refactor

Layer / File(s) Summary
Trait API Definition
dash-spv/src/sync/sync_manager.rs
SyncManager replaces clear_in_flight_state(&mut self) with on_disconnect(&mut self); stop_sync now invokes on_disconnect; MockManager test updated.
DownloadCoordinator: requeue & cancel
dash-spv/src/sync/download_coordinator.rs
Adds requeue_in_flight() to drain in-flight keys and push them to the front of pending (preserving retry_counts) and cancel_pending() to remove a pending key; tests added for priority ordering, retry preservation, and no-op behavior.
Straightforward Hook Renames
dash-spv/src/sync/block_headers/sync_manager.rs, dash-spv/src/sync/chainlock/sync_manager.rs, dash-spv/src/sync/filter_headers/sync_manager.rs, dash-spv/src/sync/filter_headers/manager.rs, dash-spv/src/sync/instantsend/sync_manager.rs, dash-spv/src/sync/masternodes/sync_manager.rs, dash-spv/src/sync/mempool/sync_manager.rs
Several managers rename the disconnect hook to on_disconnect without behavioral changes; FilterHeaders test renamed to match.
BlocksPipeline Requeue
dash-spv/src/sync/blocks/pipeline.rs
Adds BlocksPipeline::requeue_in_flight() delegating to DownloadCoordinator, preserving downloaded entries and hash/height/wallet mappings; tests ensure preservation.
BlocksManager Disconnect Preservation
dash-spv/src/sync/blocks/sync_manager.rs, dash-spv/src/sync/blocks/manager.rs
on_disconnect now preserves the pipeline and calls requeue_in_flight() to requeue in-flight getdata; start_sync resumes draining when pipeline is incomplete; clear_in_flight_state removed. New test validates preservation.
FiltersPipeline receive + requeue
dash-spv/src/sync/filters/pipeline.rs
receive_with_data cancels stale pending when coordinator.receive returns false; adds requeue_in_flight() delegating to coordinator; tests verify partial receipts and late arrivals don't orphan pending keys.
FiltersManager Disconnect vs Rescan
dash-spv/src/sync/filters/sync_manager.rs, dash-spv/src/sync/filters/manager.rs
on_disconnect re-queues in-flight filter batches to resume work; reset_for_rescan() added to fully wipe multi-batch state for wallet rescans; start_sync resumes active batches and tick uses reset_for_rescan for stale-wallet handling; tests added/updated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • dashpay/rust-dashcore#484: Directly related SyncManager lifecycle refactor that introduced/used the earlier disconnect semantics and influenced trait-level changes.

Suggested reviewers

  • ZocoLini

Poem

🐰 I tuck the in-flight hops safe in my den,
Then nudge them to front when new peers begin,
Filters and blocks keep their stitched-up thread,
Disconnects tiptoe — progress stays fed,
🥕 A hop, a requeue, and onward again.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(dash-spv): preserve in-progress sync state on peer disconnect' directly and clearly describes the main change: modifying disconnect handling to preserve sync state instead of clearing it.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/blocks-filters-disconnect-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 97.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.98%. Comparing base (478af3b) to head (d61355e).
⚠️ Report is 7 commits behind head on v0.42-dev.

Files with missing lines Patch % Lines
dash-spv/src/sync/filters/manager.rs 94.44% 2 Missing ⚠️
dash-spv/src/sync/chainlock/sync_manager.rs 0.00% 1 Missing ⚠️
dash-spv/src/sync/instantsend/sync_manager.rs 0.00% 1 Missing ⚠️
dash-spv/src/sync/masternodes/sync_manager.rs 0.00% 1 Missing ⚠️
dash-spv/src/sync/sync_manager.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           v0.42-dev     #746      +/-   ##
=============================================
+ Coverage      71.35%   71.98%   +0.63%     
=============================================
  Files            320      320              
  Lines          69370    69653     +283     
=============================================
+ Hits           49497    50138     +641     
+ Misses         19873    19515     -358     
Flag Coverage Δ
core 76.30% <ø> (-0.39%) ⬇️
ffi 46.50% <ø> (+1.80%) ⬆️
rpc 20.00% <ø> (ø)
spv 88.06% <97.00%> (+0.73%) ⬆️
wallet 71.18% <ø> (+1.08%) ⬆️
Files with missing lines Coverage Δ
dash-spv/src/sync/block_headers/sync_manager.rs 84.61% <100.00%> (ø)
dash-spv/src/sync/blocks/manager.rs 96.93% <100.00%> (+0.34%) ⬆️
dash-spv/src/sync/blocks/pipeline.rs 97.08% <100.00%> (+0.18%) ⬆️
dash-spv/src/sync/blocks/sync_manager.rs 82.35% <100.00%> (-0.51%) ⬇️
dash-spv/src/sync/download_coordinator.rs 100.00% <100.00%> (ø)
dash-spv/src/sync/filter_headers/manager.rs 91.03% <100.00%> (ø)
dash-spv/src/sync/filter_headers/sync_manager.rs 100.00% <100.00%> (ø)
dash-spv/src/sync/filters/pipeline.rs 98.21% <100.00%> (+0.12%) ⬆️
dash-spv/src/sync/filters/sync_manager.rs 100.00% <100.00%> (ø)
dash-spv/src/sync/mempool/sync_manager.rs 98.72% <100.00%> (ø)
... and 5 more

... and 56 files with indirect coverage changes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
dash-spv/src/sync/download_coordinator.rs (1)

95-110: 💤 Low value

LGTM – requeue_in_flight implementation is correct.

One cosmetic note: items.into_iter().rev() reverses a collection whose order is already arbitrary (from HashMap::drain), so the reversal has no meaningful effect on relative ordering among the requeued items. The tests correctly handle this by sorting the requeued slice before asserting content. If the intent was "push items to the front in the order they were originally enqueued," that ordering is not recoverable from the in_flight HashMap; the current code is functionally equivalent to iterating forward. This is a no-op cleanup at best.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dash-spv/src/sync/download_coordinator.rs` around lines 95 - 110, The call to
items.into_iter().rev() in requeue_in_flight is pointless because items comes
from draining a HashMap (in_flight) and has arbitrary order; remove the .rev()
so you simply iterate items.into_iter() and push_front each into pending, or if
you actually need to restore original enqueue order change in_flight from a
HashMap to an ordered container (e.g., IndexMap or track a VecDeque of keys) and
then requeue using that preserved order.
dash-spv/src/sync/blocks/sync_manager.rs (1)

35-53: ⚡ Quick win

Resume path doesn't immediately reissue requeued getdata — relies on next tick.

on_disconnect requeues in-flight slots to pending, but start_sync's non-empty-pipeline branch only flips state to Syncing without calling self.send_pending(_requests). Requeued getdatas sit idle until the 100ms tick fires. Compare with FiltersManager::start_sync (filters/sync_manager.rs Lines 60-63), which immediately calls filter_pipeline.send_pending(...) on resume.

The delay is small but the asymmetry between the two managers is easy to forget when debugging. Consider mirroring the filter-side pattern:

Proposed change
-        // If in-progress block work survived a disconnect, resume in Syncing so
-        // `process_buffered_blocks` can transition to Synced once the pipeline
-        // drains. Otherwise wait for BlocksNeeded / FiltersSyncComplete events.
-        if !self.pipeline.is_complete() {
-            self.set_state(SyncState::Syncing);
-        } else {
-            self.set_state(SyncState::WaitForEvents);
-        }
-        Ok(vec![])
+        // If in-progress block work survived a disconnect, resume in Syncing so
+        // `process_buffered_blocks` can transition to Synced once the pipeline
+        // drains. Otherwise wait for BlocksNeeded / FiltersSyncComplete events.
+        if !self.pipeline.is_complete() {
+            self.set_state(SyncState::Syncing);
+            // Reissue any requeued in-flight `getdata`s to the new peer immediately
+            // instead of waiting for the next tick.
+            if self.pipeline.has_pending_requests() {
+                self.send_pending(_requests).await?;
+            }
+        } else {
+            self.set_state(SyncState::WaitForEvents);
+        }
+        Ok(vec![])

Note: the _requests parameter would also need to be renamed to requests since it would now be used.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dash-spv/src/sync/blocks/sync_manager.rs` around lines 35 - 53, start_sync
currently flips state to Syncing when self.pipeline is non-empty but does not
reissue any requeued getdata, causing a delay until the next tick; update
sync_manager::start_sync to accept and use the RequestSender (rename _requests
to requests) and, in the branch where you call
self.set_state(SyncState::Syncing), immediately invoke the manager's
pending-send routine (analogous to FiltersManager::start_sync calling
filter_pipeline.send_pending) — call the blocks pipeline/send_pending method
(e.g., self.pipeline.send_pending or self.send_pending(requests)) right after
setting the state so requeued in-flight slots from on_disconnect are resent
without waiting for the tick.
dash-spv/src/sync/sync_manager.rs (1)

116-134: 💤 Low value

Trait contract clarification looks good; lifecycle ordering is correct.

stop_sync sets state to WaitingForConnections first, then calls on_disconnect, so any state-aware preservation logic in implementations sees the post-stop state. The new doc clearly distinguishes the "preserve durable, requeue peer-bound" intent. Note that this is a breaking trait change (no default impl), so any out-of-tree SyncManager implementations will need to be updated — worth calling out in release notes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dash-spv/src/sync/sync_manager.rs` around lines 116 - 134, The new trait
method on_disconnect currently has no default implementation which is a breaking
change; to avoid breaking out-of-tree SyncManager implementors, add a default
no-op impl for on_disconnect in the SyncManager trait (so existing impls
continue to compile) and keep stop_sync calling self.on_disconnect(); also
update the release notes to mention the lifecycle ordering (stop_sync sets
WaitingForConnections then calls on_disconnect) and the intended
preservation/requeue semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dash-spv/src/sync/filters/pipeline.rs`:
- Around line 255-262: The requeue flow can leave a batch key in
coordinator.pending without a tracker if late filter messages complete the batch
in receive_with_data; add a method on DownloadCoordinator (e.g.,
cancel_pending(&mut self, batch_start: HeightType)) that removes a key from
pending, and update receive_with_data so that after calling
coordinator.receive(&batch_start) if it returns false you invoke
coordinator.cancel_pending(batch_start) and also remove the tracker from
batch_trackers to keep coordinator.pending and batch_trackers consistent (so
send_pending no longer errors with SyncError::InvalidState).

---

Nitpick comments:
In `@dash-spv/src/sync/blocks/sync_manager.rs`:
- Around line 35-53: start_sync currently flips state to Syncing when
self.pipeline is non-empty but does not reissue any requeued getdata, causing a
delay until the next tick; update sync_manager::start_sync to accept and use the
RequestSender (rename _requests to requests) and, in the branch where you call
self.set_state(SyncState::Syncing), immediately invoke the manager's
pending-send routine (analogous to FiltersManager::start_sync calling
filter_pipeline.send_pending) — call the blocks pipeline/send_pending method
(e.g., self.pipeline.send_pending or self.send_pending(requests)) right after
setting the state so requeued in-flight slots from on_disconnect are resent
without waiting for the tick.

In `@dash-spv/src/sync/download_coordinator.rs`:
- Around line 95-110: The call to items.into_iter().rev() in requeue_in_flight
is pointless because items comes from draining a HashMap (in_flight) and has
arbitrary order; remove the .rev() so you simply iterate items.into_iter() and
push_front each into pending, or if you actually need to restore original
enqueue order change in_flight from a HashMap to an ordered container (e.g.,
IndexMap or track a VecDeque of keys) and then requeue using that preserved
order.

In `@dash-spv/src/sync/sync_manager.rs`:
- Around line 116-134: The new trait method on_disconnect currently has no
default implementation which is a breaking change; to avoid breaking out-of-tree
SyncManager implementors, add a default no-op impl for on_disconnect in the
SyncManager trait (so existing impls continue to compile) and keep stop_sync
calling self.on_disconnect(); also update the release notes to mention the
lifecycle ordering (stop_sync sets WaitingForConnections then calls
on_disconnect) and the intended preservation/requeue semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fdedfab9-db3b-4890-8346-6d64c9f9f9b3

📥 Commits

Reviewing files that changed from the base of the PR and between 49c8c6d and b628a1f.

📒 Files selected for processing (15)
  • dash-spv/src/sync/block_headers/sync_manager.rs
  • dash-spv/src/sync/blocks/manager.rs
  • dash-spv/src/sync/blocks/pipeline.rs
  • dash-spv/src/sync/blocks/sync_manager.rs
  • dash-spv/src/sync/chainlock/sync_manager.rs
  • dash-spv/src/sync/download_coordinator.rs
  • dash-spv/src/sync/filter_headers/manager.rs
  • dash-spv/src/sync/filter_headers/sync_manager.rs
  • dash-spv/src/sync/filters/manager.rs
  • dash-spv/src/sync/filters/pipeline.rs
  • dash-spv/src/sync/filters/sync_manager.rs
  • dash-spv/src/sync/instantsend/sync_manager.rs
  • dash-spv/src/sync/masternodes/sync_manager.rs
  • dash-spv/src/sync/mempool/sync_manager.rs
  • dash-spv/src/sync/sync_manager.rs

Comment thread dash-spv/src/sync/filters/pipeline.rs
…equeued batch

After `requeue_in_flight()` moves a batch's `start_height` from `coordinator.in_flight` back to `coordinator.pending`, a buffered `CFilter` from the disconnected peer can still reach `receive_with_data` and complete the batch. The tracker is removed but `coordinator.receive` returns `false` (the key is in `pending`, not `in_flight`), leaving the key orphaned in `pending`. The next `send_pending` pops it, finds no tracker, and aborts sync with `SyncError::InvalidState`.

Add `DownloadCoordinator::cancel_pending` and call it from `receive_with_data` when `receive` reports the key was not in flight.

Addresses CodeRabbit review comment on PR #746
#746 (comment)
@xdustinface
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review CodeRabbit has approved this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant