Skip to content

Latest commit

 

History

History
289 lines (226 loc) · 9.49 KB

File metadata and controls

289 lines (226 loc) · 9.49 KB

Consensus Engine

The ConsensusEngine is the heart of hotmint — an async event loop that drives the HotStuff-2 protocol.

Overview

// SharedBlockStore = Arc<parking_lot::RwLock<Box<dyn BlockStore>>>
pub struct ConsensusEngine {
    state: ConsensusState,
    store: SharedBlockStore,
    network: Box<dyn NetworkSink>,
    app: Box<dyn Application>,
    signer: Box<dyn Signer>,
    verifier: Box<dyn Verifier>,
    vote_collector: VoteCollector,
    pacemaker: Pacemaker,
    pacemaker_config: PacemakerConfig,
    msg_rx: Receiver<(Option<ValidatorId>, ConsensusMessage)>,
    status_senders: HashSet<ValidatorId>,
    current_view_qc: Option<QuorumCertificate>,
    pending_epoch: Option<Epoch>,
    persistence: Option<Box<dyn StatePersistence>>,
    evidence_store: Option<Box<dyn EvidenceStore>>,
    liveness_tracker: LivenessTracker,
    wal: Option<Box<dyn Wal>>,
    msg_rate_limiter: HashMap<ValidatorId, (Instant, u32)>,
}

The engine takes ownership of all its dependencies and runs as an infinite async loop. It is Send and designed to be spawned onto a tokio runtime.

Construction

use std::sync::Arc;
use parking_lot::RwLock;
use hotmint::consensus::engine::{ConsensusEngineBuilder, EngineConfig, SharedBlockStore};
use hotmint::crypto::Ed25519Verifier;
use hotmint::consensus::state::ConsensusState;

let store: SharedBlockStore = Arc::new(RwLock::new(Box::new(block_store)));

let engine = ConsensusEngineBuilder::new()
    .state(state)
    .store(store)
    .network(Box::new(network_sink))
    .app(Box::new(application))
    .signer(Box::new(signer))
    .messages(msg_rx)
    .verifier(Box::new(Ed25519Verifier))
    .build()
    .expect("all required fields must be set");

The msg_rx channel is the engine's sole input. All consensus messages — whether from the network or from loopback — arrive through this channel as (Option<sender_id>, message) tuples. The sender is Some(ValidatorId) for authenticated validators and None for unknown/unauthenticated peers.

Running

// engine.run() consumes self and never returns
tokio::spawn(async move { engine.run().await });

The event loop:

loop {
    tokio::select! {
        Some((sender, msg)) = self.msg_rx.recv() => {
            self.handle_message(sender, msg);
        }
        _ = self.pacemaker.sleep_until_deadline() => {
            self.handle_timeout();
        }
    }
}

ConsensusState

The mutable state tracked by the engine:

pub struct ConsensusState {
    pub validator_id: ValidatorId,
    pub validator_set: ValidatorSet,
    /// Blake3 hash of the chain identifier — included in all signing bytes
    /// to prevent cross-chain signature replay.
    pub chain_id_hash: [u8; 32],
    pub current_view: ViewNumber,
    pub current_epoch: Epoch,
    pub role: ViewRole,               // Leader or Replica
    pub step: ViewStep,               // progress within the current view
    pub locked_qc: Option<QuorumCertificate>,
    pub highest_double_cert: Option<DoubleCertificate>,
    pub highest_qc: Option<QuorumCertificate>,
    pub last_committed_height: Height,
    pub last_app_hash: BlockHash,     // state root after executing the most recently committed block
}

ConsensusState::new(validator_id, validator_set) creates state with an empty chain ID (no domain separation). For production use, prefer ConsensusState::with_chain_id(validator_id, validator_set, "my-chain") which hashes the chain ID with Blake3 and stores it in chain_id_hash. This hash is included in all signing bytes to prevent cross-chain signature replay.

Chain ID Domain Separator

The chain_id_hash field provides cross-chain replay prevention. When a chain ID is set, its Blake3 hash is included in the signing bytes of all consensus messages (votes, wishes, etc.). This means a signature produced for chain "alpha" is invalid on chain "beta", even if the same validator set is used on both chains.

// No chain ID (empty string, suitable for testing)
let state = ConsensusState::new(vid, validator_set.clone());

// With chain ID (recommended for production)
let state = ConsensusState::with_chain_id(vid, validator_set, "my-chain-id");

ViewRole

pub enum ViewRole {
    Leader,   // proposes blocks, collects votes
    Replica,  // votes on proposals
}

The role is determined at view entry: leader_for_view(v) returns Option<&ValidatorInfo> (returns None only if the validator set is empty). In practice, use .expect("non-empty validator set") or .unwrap():

if validator_set.leader_for_view(v).expect("non-empty validator set").id == self.validator_id {
    // this node is the leader
}

ViewStep

Tracks progress through the view protocol:

pub enum ViewStep {
    Entered,             // just entered the view
    WaitingForStatus,    // leader: waiting for replica status messages
    Proposed,            // leader: proposal sent
    WaitingForProposal,  // replica: waiting for leader's proposal
    Voted,               // replica: sent phase-1 vote
    CollectingVotes,     // leader: collecting phase-1 votes
    Prepared,            // leader: QC formed, Prepare sent
    SentVote2,           // replica: sent phase-2 vote
    Done,                // view protocol complete
}

Message Handling

Each ConsensusMessage variant is dispatched to a specific handler:

Propose

Propose ──> view_protocol::on_proposal()
         ──> safety check: justify.rank >= locked_qc.rank
         ──> if safe: send VoteMsg to leader

The replica validates the block via Application::validate_block() before voting.

VoteMsg (Phase 1)

VoteMsg ──> vote_collector::add_vote()
         ──> if quorum reached: on_qc_formed()
         ──> broadcast Prepare{QC}

The leader aggregates votes. When 2f+1 are collected, a QC is formed and broadcast in a Prepare message.

Prepare

Prepare ──> view_protocol::on_prepare()
         ──> update locked_qc to the received QC
         ──> send Vote2Msg to next view's leader

Vote2Msg (Phase 2)

Vote2Msg ──> vote_collector::add_vote()
          ──> if quorum reached: on_double_cert_formed()
          ──> commit block and ancestors
          ──> advance to next view

Wish

Wish ──> pacemaker::add_wish()
      ──> if quorum reached: form TimeoutCertificate
      ──> broadcast TC
      ──> advance view

TimeoutCert

TimeoutCert ──> advance to TC's target view
             ──> relay TC to other validators (if not seen before)

StatusCert

StatusCert ──> leader collects status from replicas
            ──> when enough received: try_propose()

Vote Collection

The VoteCollector manages vote aggregation for both phases:

pub struct VoteCollector {
    // phase-1 votes: view -> block_hash -> votes
    // phase-2 votes: view -> qc_block_hash -> votes
}

When a quorum (2f+1 weighted votes) is reached:

  • Phase 1: forms a QuorumCertificate with an AggregateSignature
  • Phase 2: forms a DoubleCertificate

The collector prunes stale votes for old views to prevent memory growth.

Commit Process

When a double certificate is formed:

  1. Identify the committed block from the double certificate
  2. Walk the chain from the committed block backward to last_committed_height + 1
  3. WAL: log commit intent (if WAL is configured)
  4. For each block in ascending height order:
    • Decode payload into transactions
    • app.execute_block(txs, ctx) (where txs is &[&[u8]] and ctx is a BlockContext with height, view, proposer, epoch, epoch_start_view, validator_set, vote_extensions; returns EndBlockResponse which may contain validator updates, events, and app_hash)
    • app.on_commit(block, ctx)
    • Store commit QC, tx index, and block results in the block store
    • Record QC signers in the LivenessTracker for offline detection
  5. Update last_committed_height and persist consensus state
  6. WAL: log commit done (triggers WAL truncation)
  7. At epoch boundaries: query LivenessTracker::offline_validators() and call app.on_offline_validators()

Pacemaker Integration

The pacemaker manages view timeouts independently of message processing:

  • Base timeout: 2 seconds
  • Backoff: 1.5× per consecutive timeout, capped at 30 seconds
  • Reset: on any successful view transition (QC formed, commit, etc.)

On timeout, the engine:

  1. Builds and broadcasts a Wish message
  2. Applies exponential backoff to the timer
  3. Continues listening for messages (the view is not abandoned until a TC forms)

See Protocol for the full pacemaker specification.

Signal Handling (Graceful Shutdown)

The hotmint node binary handles SIGINT (Ctrl+C) and SIGTERM for graceful shutdown. The main tokio::select! block races the engine/network tasks against signal handlers:

tokio::select! {
    // ... engine and network tasks ...
    _ = tokio::signal::ctrl_c() => {
        info!("received shutdown signal, exiting...");
    }
    _ = async {
        #[cfg(unix)]
        {
            let mut sigterm = tokio::signal::unix::signal(
                tokio::signal::unix::SignalKind::terminate()
            ).expect("failed to register SIGTERM handler");
            sigterm.recv().await;
        }
    } => {
        info!("received SIGTERM, shutting down...");
    }
}

When either signal is received, the select block completes and the process exits cleanly. This is handled at the binary level (crates/hotmint/src/bin/node.rs), not inside the ConsensusEngine itself.