Skip to content

[Bug] Cut height may not synchronize with consensus height #93

@qj0r9j0vc2

Description

@qj0r9j0vc2

Problem

There is a potential synchronization issue between the Cut heights produced by DCL and the heights requested by the consensus layer. If they don't match, build_value() will timeout waiting for a Cut that never arrives.

Location

  • crates/consensus/src/host.rs:882-890 (cut storage)
  • crates/consensus/src/host.rs:733-773 (build_value)

Code Analysis

Cut Storage (spawn_host)

// Spawn background task to process DCL cuts
tokio::spawn(async move {
    while let Some(cut) = cut_rx.recv().await {
        let height = ConsensusHeight::from(cut.height);  // Uses Cut's internal height
        value_builder_for_cuts.store_cut(height, cut).await;
    }
});

Value Request (build_value)

async fn build_value(
    &self,
    height: ConsensusHeight,  // Consensus's requested height
    _round: ConsensusRound,
) -> Result<LocallyProposedValue<CipherBftContext>, ConsensusError> {
    let timeout = Duration::from_secs(30);
    
    loop {
        {
            let mut pending = self.pending_cuts.write().await;
            if let Some(cut) = pending.remove(&height) {  // Looks up by consensus height
                // ...
            }
        }
        
        if start.elapsed() > timeout {
            return Err(ConsensusError::Other(format!(
                "Timeout: No cut available for height {} after {:?}",
                height, timeout
            )));
        }
        // ...
    }
}

The Issue

  1. DCL Primary produces Cuts with cut.height set to some value
  2. Cuts are stored by their internal height: store_cut(cut.height, cut)
  3. Consensus requests values at its own height via build_value(consensus_height, _)
  4. If cut.height != consensus_height, the lookup fails

Potential Causes

  • DCL starts at height 1, consensus starts at height 1 - OK if synchronized
  • If consensus restarts at height N but DCL is still at height M, mismatch occurs
  • No explicit synchronization mechanism between DCL and consensus heights

Impact

  • 30-second timeout on every round: If heights don't match, build_value spins for 30 seconds then fails
  • Node appears stuck: Consensus cannot make progress
  • Difficult to diagnose: Timeout error doesn't indicate the height mismatch

Suggested Fix

  1. Explicit height synchronization: DCL should be informed of the current consensus height
// In Primary or CutFormer
pub fn set_consensus_height(&mut self, height: u64) {
    self.next_cut_height = height;
}
  1. Better error messages: Include both requested and available heights
if start.elapsed() > timeout {
    let available: Vec<_> = self.pending_cuts.read().await.keys().collect();
    return Err(ConsensusError::Other(format!(
        "Timeout: No cut at height {}. Available heights: {:?}",
        height, available
    )));
}
  1. Height validation: Warn when storing a Cut if the height seems off
pub async fn store_cut(&self, height: ConsensusHeight, cut: Cut) {
    if cut.height != height.0 {
        warn!("Cut internal height {} differs from storage height {}", cut.height, height);
    }
    // ...
}

Severity

Medium - Could cause liveness issues if DCL and consensus heights diverge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions