Skip to content

Conversation

@pauldelucia
Copy link
Member

@pauldelucia pauldelucia commented Nov 24, 2025

Filter sync was stalling at 99% when syncing from checkpoint because we were trying to validate the checkpoint header with its predecessor's header, which we weren't persisting.

Fix:

  • Persist the checkpoint predecessor filter header and use it in verify_cfilter_against_headers so the first post-checkpoint filter is still validated while CFHeaders complete.
  • Teach masternode sync to backfill missing start lists (instead of erroring) and to respect the checkpoint base height for QRInfo requests, preventing hangs in the masternode phase.

Tested in DET

Addresses issue #170

Summary by CodeRabbit

  • New Features

    • Enhanced checkpoint-based synchronization for more reliable wallet sync.
    • Intelligent backfill scheduling for masternode data, allowing sync to pause and resume when needed.
    • Improved filter header verification and storage handling during checkpoint-aligned sync.
  • Bug Fixes / Chores

    • Cleans up previous checkpoint metadata when filters are cleared to avoid stale state and improve reset reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Adds checkpoint-aware handling: introduces a metadata key for the filter header preceding a trusted checkpoint, persists and caches that header, extends filter download/verification to use the trusted predecessor when needed, and adds masternode backfill/defer logic with a base-height configuration step before sync.

Changes

Cohort / File(s) Summary
Metadata Key & Re-export
dash-spv/src/storage/metadata_keys.rs, dash-spv/src/storage/mod.rs
Adds pub const CHECKPOINT_PREV_FILTER_HEADER_KEY = "checkpoint_prev_filter_header_v1" and publicly re-exports the metadata_keys module.
Storage: clear checkpoint metadata
dash-spv/src/storage/disk/filters.rs, dash-spv/src/storage/memory.rs
Updates clear_filters to remove the checkpoint-prev-filter-header metadata entry (disk: remove file at state/{CHECKPOINT_PREV_FILTER_HEADER_KEY}.dat, memory: remove key) while treating NotFound as non-fatal and mapping other IO errors to StorageError::Io.
Filter sync manager
dash-spv/src/sync/filters/manager.rs
Adds TrustedCheckpointFilterHeader { height: u32, header: FilterHeader } with to_bytes/from_bytes; adds checkpoint_prev_filter_header: Mutex<Option<TrustedCheckpointFilterHeader>>; implements persist_checkpoint_prev_filter_header and load_checkpoint_prev_filter_header to persist/load metadata and populate the cache.
Filter download & verification
dash-spv/src/sync/filters/download.rs
verify_cfilter_against_headers now attempts multi-path previous-header retrieval: normal height-1 load, fallback to trusted checkpoint predecessor when syncing from a checkpoint base (with height consistency checks), or fail with warnings otherwise. store_filter_headers now persists the trusted predecessor/header at checkpoint boundaries and handles overlapping batches via handle_overlapping_headers. Imports TrustedCheckpointFilterHeader.
Masternode sync & backfill
dash-spv/src/sync/masternodes.rs
Introduces sync_base_height, pending_start_list_backfills, pending_qrinfo_retry, and QrInfoProcessResult (Completed
Sequential sync orchestration
dash-spv/src/sync/sequential/phase_execution.rs
Calls set_sync_base_height(sync_base_height) on the masternode sync component before start_sync to establish the checkpoint base height.

Sequence Diagram(s)

sequenceDiagram
    participant sync as SequentialSyncManager
    participant mnmgr as MasternodeSyncManager
    participant storage as Storage
    participant network as Network

    sync->>mnmgr: set_sync_base_height(checkpoint_height)
    activate mnmgr
    mnmgr->>mnmgr: Store checkpoint base height
    deactivate mnmgr

    sync->>mnmgr: start_sync()
    activate mnmgr
    mnmgr->>mnmgr: determine_base_hash(storage)
    mnmgr->>mnmgr: Use checkpoint height as base
    mnmgr->>network: request_qrinfo()
    network-->>mnmgr: QRInfo received
    
    rect rgb(220, 240, 255)
    note over mnmgr: Handle QRInfo with potential backfill
    mnmgr->>mnmgr: Process QRInfo
    alt Missing start list detected
        mnmgr->>mnmgr: schedule_start_list_backfill()
        mnmgr-->>mnmgr: Return WaitingForStartList
    else Start list available
        mnmgr-->>mnmgr: Return Completed
    end
    end
    
    mnmgr->>network: request_mnlistdiff_backfill()
    network-->>mnmgr: MnListDiff received
    mnmgr->>mnmgr: apply_start_list_backfill()
    mnmgr->>mnmgr: retry_pending_qrinfo()
    mnmgr-->>sync: Sync complete
    deactivate mnmgr
Loading
sequenceDiagram
    participant download as FilterDownload
    participant manager as FilterSyncManager
    participant storage as Storage
    participant verify as VerifyLogic

    download->>manager: Syncing at checkpoint boundary
    
    rect rgb(240, 255, 240)
    note over download,manager: Store Phase: At checkpoint base
    download->>manager: persist_checkpoint_prev_filter_header()
    activate manager
    manager->>storage: Store header to metadata
    manager->>manager: Cache in Mutex
    deactivate manager
    download->>storage: Store filter headers normally
    end

    download->>verify: verify_cfilter_against_headers()
    activate verify
    verify->>verify: Need previous header
    alt Previous header at height-1 exists
        verify->>storage: Load from height-1
    else At checkpoint base
        rect rgb(255, 240, 240)
        note over verify,manager: Checkpoint fallback
        verify->>manager: load_checkpoint_prev_filter_header()
        manager-->>verify: Return cached header
        end
    else Outside checkpoint path
        verify->>verify: Log warning, fail verification
    end
    deactivate verify
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Pay attention to: dash-spv/src/sync/filters/download.rs (multi-path header retrieval and error messaging), dash-spv/src/sync/masternodes.rs (QRInfo defer/backfill state machine and integrations), and dash-spv/src/sync/filters/manager.rs (serialization, storage calls, and Mutex cache coherency).

Poem

🐰 A checkpoint awaits, where headers align,
The filter before stands guard at the line,
Backfills bloom when start-lists grow thin,
State machines dance as we sync from within,
Caches and persistence—a rabbit's design! 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: fixing filter sync verification at checkpoints and adding masternode start list backfill functionality.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/filter-sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
dash-spv/src/sync/masternodes.rs (2)

52-61: Base-height propagation and QRInfo base-hash selection look consistent

The added sync_base_height field, set_sync_base_height, and determine_base_hash logic correctly prioritize:

  • last successful last_qrinfo_block_hash,
  • then the header at sync_base_height (checkpoint base),
  • and finally the network’s genesis hash on failure.

Using this helper both in start_sync and in the QRInfo retry path in check_sync_timeout keeps QRInfo base selection consistent across first-run and retries, and is safe when sync_base_height == 0 due to the genesis fallback.

Also applies to: 123-140, 143-147, 357-396, 544-589, 717-718


63-66: Replaying full QRInfo after backfill may double-feed diffs; confirm engine idempotence

On MissingStartMasternodeList, feed_qrinfo_and_get_additional_diffs now:

  • Schedules a start-list backfill and returns QrInfoProcessResult::WaitingForStartList,
  • handle_qrinfo_message stores the original QRInfo in pending_qrinfo_retry and returns,
  • retry_pending_qrinfo later calls handle_qrinfo_message again with the same QRInfo.

Because the first pass already:

  • Called feed_qrinfo_block_heights,
  • Inserted all MnListDiffs into mnlist_diffs,
  • And invoked engine.feed_qr_info(..) up to the error point,

the second pass will repeat those operations after the backfill. Unless MasternodeListEngine::feed_qr_info / apply_diff are explicitly idempotent for already-applied diffs and re-fed heights, this could risk double-application or invariants being violated.

It would be good to verify the engine’s behavior here and, if necessary, adjust to either:

  • Make the first pass a “dry run” that only discovers missing start lists without mutating engine state, or
  • Gate the second pass so it skips re-insertion / re-feeding for diffs that were already applied.

Also applies to: 238-243, 823-833, 897-903, 929-934, 979-980

dash-spv/src/storage/disk/filters.rs (1)

8-9: Checkpoint predecessor metadata cleanup is wired correctly

Importing StorageError/StorageResult and CHECKPOINT_PREV_FILTER_HEADER_KEY, then removing state/{CHECKPOINT_PREV_FILTER_HEADER_KEY}.dat on clear_filters, with NotFound ignored and other I/O mapped to StorageError::Io, is consistent with the new checkpoint header persistence flow. This keeps storage/metadata in sync when filters are cleared.

Tiny cleanup: rely on remove_file’s NotFound handling instead of a pre‑exists()

You can avoid the extra synchronous metadata_path.exists() check (and the TOCTOU window it introduces) and just unconditionally call remove_file, since you already special‑case ErrorKind::NotFound:

-        let metadata_path =
-            self.base_path.join(format!("state/{}.dat", CHECKPOINT_PREV_FILTER_HEADER_KEY));
-        if metadata_path.exists() {
-            if let Err(e) = tokio::fs::remove_file(&metadata_path).await {
-                if e.kind() != std::io::ErrorKind::NotFound {
-                    return Err(StorageError::Io(e));
-                }
-            }
-        }
+        let metadata_path =
+            self.base_path.join(format!("state/{}.dat", CHECKPOINT_PREV_FILTER_HEADER_KEY));
+        if let Err(e) = tokio::fs::remove_file(&metadata_path).await {
+            if e.kind() != std::io::ErrorKind::NotFound {
+                return Err(StorageError::Io(e));
+            }
+        }

This keeps behavior identical while simplifying the code.

Also applies to: 220-229

dash-spv/src/sync/filters/manager.rs (1)

305-351: Checkpoint predecessor header persistence/load is correct; consider cache reset semantics

The persist_checkpoint_prev_filter_header and load_checkpoint_prev_filter_header helpers correctly:

  • Persist the 36‑byte encoding under CHECKPOINT_PREV_FILTER_HEADER_KEY with clear SyncError::Storage(...) mapping, and
  • Cache the decoded value behind an async Mutex to avoid repeated metadata I/O, logging a warning if the on‑disk format is unexpected.

One thing to double‑check: the checkpoint_prev_filter_header cache is never cleared or keyed by sync_base_height. If, in your lifecycle, FilterSyncManager is reused while:

  • The checkpoint base height changes (via set_sync_base_height), or
  • The underlying storage has its metadata cleared (e.g., clear_filters on the same StorageManager instance),

then this cache could serve a stale predecessor header that no longer matches the current checkpoint configuration or disk state.

If those scenarios are possible, consider resetting the cache when changing base height or when clearing filter state—for example, clearing it in set_sync_base_height (with a non‑blocking try_lock) or in clear_filter_state. If the manager is always recreated when the checkpoint or storage is reset, the current design is fine but might benefit from a brief doc comment stating that assumption.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e9ef04 and bb04076.

📒 Files selected for processing (8)
  • dash-spv/src/storage/disk/filters.rs (2 hunks)
  • dash-spv/src/storage/memory.rs (2 hunks)
  • dash-spv/src/storage/metadata_keys.rs (1 hunks)
  • dash-spv/src/storage/mod.rs (1 hunks)
  • dash-spv/src/sync/filters/download.rs (3 hunks)
  • dash-spv/src/sync/filters/manager.rs (4 hunks)
  • dash-spv/src/sync/masternodes.rs (11 hunks)
  • dash-spv/src/sync/sequential/phase_execution.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T16:01:37.609Z
Learning: The mempool tracking infrastructure (UnconfirmedTransaction, MempoolState, configuration, and mempool_filter.rs) is fully implemented and integrated in the Dash SPV client as of this PR, including client logic, FFI APIs, and tests.
📚 Learning: 2025-06-26T16:01:37.609Z
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T16:01:37.609Z
Learning: The mempool tracking infrastructure (UnconfirmedTransaction, MempoolState, configuration, and mempool_filter.rs) is fully implemented and integrated in the Dash SPV client as of this PR, including client logic, FFI APIs, and tests.

Applied to files:

  • dash-spv/src/storage/memory.rs
  • dash-spv/src/storage/disk/filters.rs
  • dash-spv/src/sync/filters/download.rs
  • dash-spv/src/sync/filters/manager.rs
  • dash-spv/src/sync/masternodes.rs
📚 Learning: 2025-06-26T15:54:02.509Z
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T15:54:02.509Z
Learning: The `StorageManager` trait in `dash-spv/src/storage/mod.rs` uses `&mut self` methods but is also `Send + Sync`, and implementations often use interior mutability for concurrency. This can be confusing, so explicit documentation should clarify thread-safety expectations and the rationale for the API design.

Applied to files:

  • dash-spv/src/storage/memory.rs
  • dash-spv/src/storage/disk/filters.rs
  • dash-spv/src/storage/mod.rs
  • dash-spv/src/sync/filters/manager.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Key Wallet Components Tests
  • GitHub Check: fuzz (hashes_sha256)
  • GitHub Check: SPV Components Tests
  • GitHub Check: fuzz (dash_deser_net_msg)
  • GitHub Check: Core Components Tests
  • GitHub Check: RPC Tests (stable, true)
  • GitHub Check: fuzz (hashes_sha1)
  • GitHub Check: fuzz (hashes_ripemd160)
  • GitHub Check: Strict Warnings and Clippy Checks
  • GitHub Check: fuzz (dash_deserialize_script)
  • GitHub Check: fuzz (hashes_sha512)
  • GitHub Check: fuzz (hashes_sha512_256)
  • GitHub Check: fuzz (dash_deserialize_amount)
  • GitHub Check: fuzz (hashes_cbor)
  • GitHub Check: fuzz (hashes_json)
  • GitHub Check: fuzz (dash_deserialize_block)
  • GitHub Check: fuzz (dash_deserialize_witness)
  • GitHub Check: fuzz (dash_deserialize_address)
  • GitHub Check: fuzz (dash_script_bytes_to_asm_fmt)
  • GitHub Check: fuzz (dash_outpoint_string)
🔇 Additional comments (7)
dash-spv/src/sync/sequential/phase_execution.rs (1)

42-100: Propagating sync_base_height into masternode_sync before start looks correct

Wiring the header sync’s base height into masternode_sync before start_sync ensures QRInfo base selection is checkpoint-aware, while keeping the non-checkpoint path (height 0) effectively a no-op via the existing fallback to genesis.

dash-spv/src/storage/mod.rs (1)

3-8: Exposing metadata_keys from storage is appropriate

Adding pub mod metadata_keys; cleanly centralizes shared metadata key constants for all storage backends, without changing existing behavior of the StorageManager API.

dash-spv/src/storage/memory.rs (1)

9-12: Clearing checkpoint_prev_filter_header metadata alongside filters avoids stale state

Importing CHECKPOINT_PREV_FILTER_HEADER_KEY and removing it in clear_filters keeps the in-memory store’s checkpoint filter-header metadata consistent with the actual filter data, mirroring the disk backend and preventing stale checkpoint predecessors from affecting later syncs.

Also applies to: 311-316

dash-spv/src/storage/metadata_keys.rs (1)

1-4: Checkpoint prev-filter-header metadata key is clearly defined

The CHECKPOINT_PREV_FILTER_HEADER_KEY constant and its documentation precisely capture the intended usage (storing the filter header for the block before a trusted checkpoint), and the _v1 suffix leaves space for future schema evolution.

dash-spv/src/sync/filters/manager.rs (1)

21-52: TrustedCheckpointFilterHeader layout and caching field look sound

The TrustedCheckpointFilterHeader struct plus to_bytes/from_bytes implementation (4‑byte LE height followed by 32‑byte FilterHeader bytes, fixed 36‑byte payload) is straightforward and safe: the length check guards against OOB, and deriving Copy/Eq is appropriate for this data. Wiring it into FilterSyncManager via the checkpoint_prev_filter_header: Mutex<Option<...>> field and initializing it to None in new is consistent with the rest of the manager’s async state.

Also applies to: 140-142, 188-188

dash-spv/src/sync/filters/download.rs (2)

19-20: CFilter verification now robustly handles checkpoint predecessors

The updated verify_cfilter_against_headers logic looks solid:

  • Genesis (height 0) is short‑circuited as before.
  • For height > 0, it first attempts to load the previous filter header from storage and surfaces genuine storage errors as SyncError::Storage with clear context.
  • When the previous header is missing exactly at the configured checkpoint base (sync_base_height > 0 && height == sync_base_height), it correctly falls back to load_checkpoint_prev_filter_header, enforcing that the trusted record’s height matches height‑1 and treating mismatches or missing metadata as verification failures (with good logging).
  • The expected header at height is now explicitly required; absence yields a warning and a clean Ok(false) result.

This matches the intended behavior: the first post‑checkpoint filter can be validated using the persisted predecessor header while still cleanly distinguishing storage errors from validation failures.

Also applies to: 42-78, 84-86


617-650: Checkpoint predecessor/header storage around the base height is consistent and covers both entry cases

The new logic in store_filter_headers for the non‑overlapping path correctly handles checkpoint boundaries:

  • When start_height == sync_base_height and sync_base_height > 0, it persists a TrustedCheckpointFilterHeader whose height is sync_base_height - 1 and whose header is cfheaders.previous_filter_header, ensuring the predecessor is available for cfilter verification at the checkpoint block.
  • When the first batch starts at sync_base_height + 1 and we haven’t yet stored the checkpoint header (current_filter_tip < sync_base_height), it stores cfheaders.previous_filter_header as the filter header for the checkpoint block itself, then proceeds to write the newly computed headers.

Combined with the earlier genesis case and overlapping handling, this neatly covers:

  • Genesis sync (sync_base_height == 0),
  • Checkpoint sync where the first CFHeaders batch starts exactly at the base, and
  • The off‑by‑one scenario where the first useful batch starts at base+1.

The error handling (SyncError::Storage on write failures, logging on success) is also consistent with the rest of the module.

Comment on lines 148 to 216
async fn schedule_start_list_backfill(
&mut self,
missing_block_hash: BlockHash,
storage: &mut S,
network: &mut dyn NetworkManager,
) -> Result<(), String> {
if self.pending_start_list_backfills.contains_key(&missing_block_hash) {
tracing::info!(
"Already waiting for start masternode list at block {}",
missing_block_hash
);
return Ok(());
}

let engine =
self.engine.as_mut().ok_or_else(|| "Masternode engine not initialized".to_string())?;
let (latest_height, latest_block_hash) = engine
.latest_masternode_list()
.map(|list| (list.known_height, list.block_hash))
.ok_or_else(|| "Masternode engine has no known masternode list".to_string())?;

let target_height = storage
.get_header_height_by_hash(&missing_block_hash)
.await
.map_err(|e| {
format!(
"Failed to look up height for missing start list block {}: {}",
missing_block_hash, e
)
})?
.ok_or_else(|| {
format!(
"Height not found in storage for missing start list block {}",
missing_block_hash
)
})?;

if latest_height >= target_height {
tracing::debug!(
"Engine already has masternode list at or beyond height {}",
target_height
);
return Ok(());
}

engine.feed_block_height(target_height, missing_block_hash);

let request = dashcore::network::message_sml::GetMnListDiff {
base_block_hash: latest_block_hash,
block_hash: missing_block_hash,
};

network
.send_message(NetworkMessage::GetMnListD(request))
.await
.map_err(|e| format!("Failed to send MnListDiff backfill request: {}", e))?;

tracing::info!(
"Requested MnListDiff backfill from height {} to {} ({} -> {})",
latest_height,
target_height,
latest_block_hash,
missing_block_hash
);

self.pending_start_list_backfills.insert(missing_block_hash, target_height);

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Start-list backfills are not covered by timeout/retry and can stall masternode sync

schedule_start_list_backfill issues a GetMnListDiff and tracks it in pending_start_list_backfills, and handle_mnlistdiff_message applies the backfill plus retry_pending_qrinfo. However, nothing in check_sync_timeout or elsewhere looks at pending_start_list_backfills, so if a backfill response never arrives:

  • sync_in_progress stays true,
  • pending_qrinfo_retry remains Some(..),
  • and no retry or forced completion is triggered.

This can leave the masternode phase stuck indefinitely on networks where a backfill GetMnListD is dropped or never answered.

Consider tying these backfill requests into the existing timeout machinery, e.g.:

  • Increment pending_mnlistdiff_requests and start mnlistdiff_wait_start when scheduling a backfill, so the existing 15s timeout / retry QRInfo logic also covers them; or
  • Track a dedicated timeout for entries in pending_start_list_backfills and, on expiry, either re-issue the backfill request or fail the masternode sync gracefully with a clear error.

Also applies to: 218-243, 592-598, 600-667

🤖 Prompt for AI Agents
In dash-spv/src/sync/masternodes.rs around lines 148-216,
schedule_start_list_backfill creates a GetMnListDiff and inserts into
pending_start_list_backfills but never wires that request into the existing
timeout/retry machinery, allowing masternode sync to stall if the response is
lost; fix by updating schedule_start_list_backfill to also increment the
pending_mnlistdiff_requests counter and set mnlistdiff_wait_start (or otherwise
start the same 15s timeout) when sending the backfill so check_sync_timeout will
handle retries/QRInfo logic, and ensure retry_pending_qrinfo or re-issuing the
GetMnListDiff clears/updates pending_start_list_backfills on timeout/failure;
alternatively implement a per-entry timeout timestamp stored with
pending_start_list_backfills and have check_sync_timeout reissue or fail
backfills when expired, making sure to update all affected call sites (including
the other ranges noted: ~218-243, 592-598, 600-667) so no backfill can hang
indefinitely.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
dash-spv/src/sync/masternodes.rs (1)

148-213: Start‑list backfill GetMnListDiff requests can still stall masternode sync indefinitely

schedule_start_list_backfill sends a GetMnListDiff (Lines 192‑200, 197‑200) and tracks it only in pending_start_list_backfills (Line 210), but never increments pending_mnlistdiff_requests or starts mnlistdiff_wait_start. check_sync_timeout (Lines 675‑762) only looks at pending_mnlistdiff_requests, so if a backfill response is dropped:

  • sync_in_progress remains true,
  • pending_qrinfo_retry stays Some(..) (set in handle_qrinfo_message), and
  • no timeout, retry, or forced completion is ever triggered.

This is the same structural issue flagged earlier and still allows the masternode phase to hang forever on a lost backfill response. Additionally, if schedule_start_list_backfill or apply_start_list_backfill fail, the error is recorded but sync_in_progress is left true and the pending backfill / deferred QRInfo state is never unwound, compounding the stall.

Consider tying backfills into the existing timeout machinery and cleaning up state on fatal failure. For example:

@@ async fn schedule_start_list_backfill(
-        network
-            .send_message(NetworkMessage::GetMnListD(request))
-            .await
-            .map_err(|e| format!("Failed to send MnListDiff backfill request: {}", e))?;
+        network
+            .send_message(NetworkMessage::GetMnListD(request))
+            .await
+            .map_err(|e| format!("Failed to send MnListDiff backfill request: {}", e))?;
+
+        // Treat backfill like a normal MnListDiff request so timeout/retry logic applies
+        self.pending_mnlistdiff_requests += 1;
+        if self.mnlistdiff_wait_start.is_none() {
+            self.mnlistdiff_wait_start = Some(Instant::now());
+        }
@@ pub async fn check_sync_timeout(
-                        // Force completion to unblock sync
-                        self.pending_mnlistdiff_requests = 0;
-                        self.mnlistdiff_wait_start = None;
-                        self.sync_in_progress = false;
-                        self.error = Some("MnListDiff requests timed out after retry".to_string());
+                        // Force completion to unblock sync
+                        self.pending_mnlistdiff_requests = 0;
+                        self.mnlistdiff_wait_start = None;
+                        self.sync_in_progress = false;
+                        self.error = Some("MnListDiff requests timed out after retry".to_string());
+                        // Also drop any backfill/deferred QRInfo state so we don't hold stale work
+                        self.pending_start_list_backfills.clear();
+                        self.pending_qrinfo_retry = None;

You may also want analogous cleanup (set sync_in_progress = false, clear pending_start_list_backfills / pending_qrinfo_retry) in clearly unrecoverable error paths from schedule_start_list_backfill / apply_start_list_backfill to avoid leaving the phase stuck on a hard failure.

Also applies to: 215-240, 593-593, 597-613, 661-663, 669-762

🧹 Nitpick comments (1)
dash-spv/src/sync/masternodes.rs (1)

820-835: Deferred QRInfo + MissingStartMasternodeList flow looks good; consider marking sync as failed on hard errors

The new QrInfoProcessResult flow cleanly handles MissingStartMasternodeList by:

  • Scheduling a backfill and returning WaitingForStartList(..) from feed_qrinfo_and_get_additional_diffs (Lines 926‑931, 976‑977).
  • Deferring QRInfo completion by storing pending_qrinfo_retry and early‑returning from handle_qrinfo_message (Lines 821‑828).

Once backfills apply, retry_pending_qrinfo drives the same QRInfo back through the unified path, which is a nice, minimal change to the control flow.

One improvement: in the Err(e) arm of the match in handle_qrinfo_message (Lines 830‑834), errors from feed_qrinfo_and_get_additional_diffs are logged and stored, but sync_in_progress and any pending backfill/deferred‑QRInfo state remain set. That can leave the phase appearing “stuck” and block subsequent start_sync calls even though we know this attempt has failed.

You could treat this as a hard failure of the current sync attempt and clear state accordingly, for example:

-            Err(e) => {
-                tracing::error!("❌ Failed to process QRInfo follow-up diffs: {}", e);
-                self.error = Some(e);
-                return;
-            }
+            Err(e) => {
+                tracing::error!("❌ Failed to process QRInfo follow-up diffs: {}", e);
+                self.error = Some(e);
+                self.sync_in_progress = false;
+                self.pending_start_list_backfills.clear();
+                self.pending_qrinfo_retry = None;
+                return;
+            }

This keeps the component in a well‑defined “failed” state and lets the caller decide when/how to retry.

Also applies to: 899-931, 976-977

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb04076 and c26abcc.

📒 Files selected for processing (1)
  • dash-spv/src/sync/masternodes.rs (11 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T16:01:37.609Z
Learning: The mempool tracking infrastructure (UnconfirmedTransaction, MempoolState, configuration, and mempool_filter.rs) is fully implemented and integrated in the Dash SPV client as of this PR, including client logic, FFI APIs, and tests.
📚 Learning: 2025-06-26T16:01:37.609Z
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T16:01:37.609Z
Learning: The mempool tracking infrastructure (UnconfirmedTransaction, MempoolState, configuration, and mempool_filter.rs) is fully implemented and integrated in the Dash SPV client as of this PR, including client logic, FFI APIs, and tests.

Applied to files:

  • dash-spv/src/sync/masternodes.rs
📚 Learning: 2025-06-26T16:02:42.390Z
Learnt from: DCG-Claude
Repo: dashpay/rust-dashcore PR: 0
File: :0-0
Timestamp: 2025-06-26T16:02:42.390Z
Learning: Passing exclusive mutable references to network and storage managers into the sync manager in async Rust can lead to borrow checker issues or runtime contention if concurrent access is needed elsewhere. It's advisable to document this architectural tradeoff and consider refactoring to interior mutability or message passing for shared access as the codebase evolves.

Applied to files:

  • dash-spv/src/sync/masternodes.rs
🧬 Code graph analysis (1)
dash-spv/src/sync/masternodes.rs (2)
dash-spv/src/sync/filters/manager.rs (1)
  • set_sync_base_height (195-197)
dash/src/sml/masternode_list/from_diff.rs (1)
  • diff (90-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: verify-execution
  • GitHub Check: Strict Warnings and Clippy Checks
  • GitHub Check: RPC Tests (stable, true)
  • GitHub Check: Core Components Tests
  • GitHub Check: SPV Components Tests
🔇 Additional comments (4)
dash-spv/src/sync/masternodes.rs (4)

52-61: Checkpoint-aware sync fields and QRInfoProcessResult modeling look good

The added state (sync_base_height, pending_start_list_backfills, pending_qrinfo_retry) plus QrInfoProcessResult cleanly model checkpoint-based and deferred QRInfo flows, and the constructor + set_sync_base_height keep defaults and configuration straightforward.

Also applies to: 63-66, 135-137, 143-146


215-233: Backfill application logic is consistent with engine usage

Feeding target_height and applying the diff against that height before logging (Lines 220‑231) integrates cleanly with the existing engine semantics, and using the diff’s block_hash as the fed height anchor matches how other paths feed block heights.


235-240: Deferred QRInfo retry helper is simple and safe

retry_pending_qrinfo’s take() semantics avoid multiple retries of the same QRInfo and re-use the normal handle_qrinfo_message path, which keeps the flow centralized and avoids duplicate logic.


354-393: Base‑hash selection via determine_base_hash correctly incorporates checkpoints

determine_base_hash’s priority order—last_qrinfo_block_hash if present, else checkpoint header at sync_base_height, else genesis (Lines 358‑393)—is clear and robust, and the logging on checkpoint fallback is helpful for debugging. Wiring this into both start_sync (Line 571) and the retry path in check_sync_timeout (Line 714) aligns QRInfo requests with the configured checkpoint base height while preserving the progressive “last successful QRInfo” behavior.

Also applies to: 571-571, 714-714

@github-actions
Copy link

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

@github-actions github-actions bot added the merge-conflict The PR conflicts with the target branch. label Dec 11, 2025
@xdustinface xdustinface changed the base branch from v0.41-dev to v0.42-dev December 30, 2025 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflict The PR conflicts with the target branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants