-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Problem
There is a potential synchronization issue between the Cut heights produced by DCL and the heights requested by the consensus layer. If they don't match, build_value() will timeout waiting for a Cut that never arrives.
Location
crates/consensus/src/host.rs:882-890(cut storage)crates/consensus/src/host.rs:733-773(build_value)
Code Analysis
Cut Storage (spawn_host)
// Spawn background task to process DCL cuts
tokio::spawn(async move {
while let Some(cut) = cut_rx.recv().await {
let height = ConsensusHeight::from(cut.height); // Uses Cut's internal height
value_builder_for_cuts.store_cut(height, cut).await;
}
});Value Request (build_value)
async fn build_value(
&self,
height: ConsensusHeight, // Consensus's requested height
_round: ConsensusRound,
) -> Result<LocallyProposedValue<CipherBftContext>, ConsensusError> {
let timeout = Duration::from_secs(30);
loop {
{
let mut pending = self.pending_cuts.write().await;
if let Some(cut) = pending.remove(&height) { // Looks up by consensus height
// ...
}
}
if start.elapsed() > timeout {
return Err(ConsensusError::Other(format!(
"Timeout: No cut available for height {} after {:?}",
height, timeout
)));
}
// ...
}
}The Issue
- DCL Primary produces Cuts with
cut.heightset to some value - Cuts are stored by their internal height:
store_cut(cut.height, cut) - Consensus requests values at its own height via
build_value(consensus_height, _) - If
cut.height != consensus_height, the lookup fails
Potential Causes
- DCL starts at height 1, consensus starts at height 1 - OK if synchronized
- If consensus restarts at height N but DCL is still at height M, mismatch occurs
- No explicit synchronization mechanism between DCL and consensus heights
Impact
- 30-second timeout on every round: If heights don't match,
build_valuespins for 30 seconds then fails - Node appears stuck: Consensus cannot make progress
- Difficult to diagnose: Timeout error doesn't indicate the height mismatch
Suggested Fix
- Explicit height synchronization: DCL should be informed of the current consensus height
// In Primary or CutFormer
pub fn set_consensus_height(&mut self, height: u64) {
self.next_cut_height = height;
}- Better error messages: Include both requested and available heights
if start.elapsed() > timeout {
let available: Vec<_> = self.pending_cuts.read().await.keys().collect();
return Err(ConsensusError::Other(format!(
"Timeout: No cut at height {}. Available heights: {:?}",
height, available
)));
}- Height validation: Warn when storing a Cut if the height seems off
pub async fn store_cut(&self, height: ConsensusHeight, cut: Cut) {
if cut.height != height.0 {
warn!("Cut internal height {} differs from storage height {}", cut.height, height);
}
// ...
}Severity
Medium - Could cause liveness issues if DCL and consensus heights diverge.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working