Skip to content

feat(db): Repair missing spent UTXOs and hashes CF gaps; deepen reorg undo buffer#5

Open
DeviaVir wants to merge 1 commit intomasterfrom
repair
Open

feat(db): Repair missing spent UTXOs and hashes CF gaps; deepen reorg undo buffer#5
DeviaVir wants to merge 1 commit intomasterfrom
repair

Conversation

@DeviaVir
Copy link

@DeviaVir DeviaVir commented Feb 9, 2026

  • Backfill missing spent UTXO script hashes from the node/esplora during block indexing to avoid hard failures on partial DB corruption.
  • Repair small gaps in hashesv2 at startup by fetching missing block headers and persisting them; fail only when repair is not possible.
  • Expand reorg undo handling to a multi‑block buffer and align memory/db store behavior; add tests for gap detection and multi‑block reorgs.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves resilience of the indexing/storage pipeline by repairing certain on-disk inconsistencies (missing spent UTXO script hashes and small header/hash CF gaps) and by expanding reorg undo handling to support multi-block reorgs.

Changes:

  • Backfill missing spent UTXO script hashes during block indexing by querying the node/esplora, rather than hard-failing on partial UTXO DB corruption.
  • Add startup detection + optional repair of small gaps in the stored block hash/timestamp column family, persisting repaired entries.
  • Expand reorg undo handling from single-block to a bounded multi-block in-memory buffer, with new tests for multi-block reorg behavior and gap detection.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/threads/blocks.rs Adds spent-UTXO resolution/backfill to produce SpentUtxo entries with script hashes before store update.
src/store/mod.rs Introduces SpentUtxo, adds put_hash_ts to the Store trait, and defines a max reorg undo buffer depth.
src/store/memory.rs Implements multi-block reorg undo buffering and updates update()/reorg() to use SpentUtxo.
src/store/db.rs Implements multi-block reorg undo buffering, updates update()/reorg() to use SpentUtxo, and adds hash/timestamp delete/write helpers + tests.
src/server/preload.rs Adds hashes CF gap detection and optional repair on startup (with a new test).
src/server/mod.rs Updates server startup to provide a Client to header preloading/repair.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 18 to +84
let mut blocks_hash_ts = state.blocks_hash_ts.lock().await;
let mut i = 0usize;
for meta in state.store.iter_hash_ts() {
assert_eq!(i as u32, meta.height());
let metas: Vec<BlockMeta> = state.store.iter_hash_ts().collect();
for meta in metas {
if i as u32 != meta.height() {
let gap_start = i as u32;
let gap_end = meta.height();
let gap_len = gap_end.saturating_sub(gap_start);

if gap_len == 0 {
return Err(Error::DBCorrupted(format!(
"hashes CF out-of-order entry at height {}, reindex required",
meta.height()
)));
}

let client = client.ok_or_else(|| {
Error::DBCorrupted(format!(
"hashes CF gap detected: expected height {}, found {}. \
DB is inconsistent; reindex required",
i,
meta.height()
))
})?;

if gap_len > MAX_HASH_GAP_REPAIR {
return Err(Error::DBCorrupted(format!(
"hashes CF gap too large to repair ({} blocks from {} to {}), reindex required",
gap_len,
gap_start,
gap_end - 1
)));
}

log::warn!(
"hashes CF gap detected ({} blocks from {} to {}), attempting repair",
gap_len,
gap_start,
gap_end - 1
);

for height in gap_start..gap_end {
let hash = client
.block_hash(height)
.await
.map_err(|e| Error::DBCorrupted(format!("failed to fetch block hash: {e}")))?
.ok_or_else(|| {
Error::DBCorrupted(format!(
"missing block hash at height {height} while repairing hashes CF"
))
})?;
let header = client
.block_header(hash, family)
.await
.map_err(|e| {
Error::DBCorrupted(format!(
"failed to fetch block header for {hash}: {e}"
))
})?;
let repaired = BlockMeta::new(height, hash, header.time());
state
.store
.put_hash_ts(&repaired)
.map_err(|e| Error::DBCorrupted(format!("failed to write hash meta: {e}")))?;
blocks_hash_ts.push((repaired.hash(), repaired.timestamp()));
}
i = gap_end as usize;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks_hash_ts (a tokio::Mutex) is locked for the entire duration of header gap repair, but the repair loop performs multiple .await network calls (block_hash/block_header). Holding an async mutex guard across awaits can block other tasks and can deadlock if any awaited path tries to read/update blocks_hash_ts indirectly. Refactor to avoid awaiting while the mutex is held (e.g., build a local list of metas to append + perform repairs first, then take the lock briefly to extend/replace the vector).

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
let metas: Vec<BlockMeta> = state.store.iter_hash_ts().collect();
for meta in metas {
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let metas: Vec<BlockMeta> = state.store.iter_hash_ts().collect(); loads the entire hashes CF into memory at startup. On mainnet-sized chains this can be tens of MB and increases startup latency/peak RSS. Consider iterating in a streaming fashion (or repairing gaps in a separate pass) to avoid collecting all metas at once.

Suggested change
let metas: Vec<BlockMeta> = state.store.iter_hash_ts().collect();
for meta in metas {
for meta in state.store.iter_hash_ts() {

Copilot uses AI. Check for mistakes.
Comment on lines 670 to 674
.unwrap(); // TODO handle unwrap;

self.delete_hash_ts_batch(&mut batch, reorg_data.height);

self.write(batch).unwrap(); // TODO handle unwrap;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorg() still uses unwrap() on RocksDB operations (insert_utxos(...).unwrap() and self.write(batch).unwrap()). Since other reorg cleanup paths were updated to include context and panic with a message, these unwraps can obscure the root cause and make failures harder to diagnose. Prefer consistent error handling here as well (propagate a Result, or unwrap_or_else with an error_panic! message that includes the height).

Suggested change
.unwrap(); // TODO handle unwrap;
self.delete_hash_ts_batch(&mut batch, reorg_data.height);
self.write(batch).unwrap(); // TODO handle unwrap;
.unwrap_or_else(|e| {
error_panic!(
"failed to restore reorg-spent utxos at height {}: {e}",
reorg_data.height
)
});
self.delete_hash_ts_batch(&mut batch, reorg_data.height);
self.write(batch).unwrap_or_else(|e| {
error_panic!(
"failed to write reorg batch at height {}: {e}",
reorg_data.height
)
});

Copilot uses AI. Check for mistakes.
@RCasatta
Copy link

RCasatta commented Feb 9, 2026

This PR is big and it also try to fix different things at the same time (handling long reorgs and fixing corrupted database) making it more complicated.
I don't want to complicate things too much before understanding the root cause, for example I don't think a multi block reorg happened.

Screenshot From 2026-02-09 14-40-24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants