Skip to content

fix(consensus): apply v2.7.0 hotfix to dev branch and revert optimization changes#2357

Open
wanwiset25 wants to merge 6 commits into
dev-upgradefrom
fix-signature-dedup-construction-4
Open

fix(consensus): apply v2.7.0 hotfix to dev branch and revert optimization changes#2357
wanwiset25 wants to merge 6 commits into
dev-upgradefrom
fix-signature-dedup-construction-4

Conversation

@wanwiset25
Copy link
Copy Markdown
Collaborator

@wanwiset25 wanwiset25 commented May 21, 2026

Proposed changes

- Reverted #1625

- Reverted #1236

- Applied #1643 (to align code with v2.7.0)

- fix(consensus): signature dedup construction check

XDPoS v2 had a liveness vulnerability in the QC/TC authoring path where a single byzantine masternode could amplify their own valid signature into multiple pool entries — ECDSA non-determinism (custom random nonce) lets the same signer produce arbitrarily many byte-distinct valid signatures for the same vote/timeout message. These would inflate the pool past threshold prematurely, and the strict authoring path (if count != len(validSignatures) → skip QC) would then bail out, costing the leader a round. Repeated across rotating attackers, this could degrade chain liveness without breaking safety.

This PR adds signature verification (verifyAllSignatures) used by every QC/TC verification site in the engine, and switches the authoring paths (onVotePoolThresholdReached, onTimeoutPoolThresholdReached) from strict-bail to lenient-filter — duplicate or invalid signatures are dropped, and a clean QC/TC is built from the unique verified subset if threshold is still met.

In addition, hasLowS checks are added to record then discard all mallead consensus signatures.
hasLowS reports whether the signature's S value is in the lower half of the secp256k1 curve order, the canonical low-S form that prevents (r, s, v) -> (r, N-s, v') malleability aka signature-flipping. (low-s is considered valid, high-s is considered mallead signature)

Why lenient at authoring, strict at verification

  • Verification side (received QC/TC from peer): a malformed QC means the producer is byzantine or buggy → reject. No liveness cost to us.
  • Authoring side (we are the leader building a QC/TC from our own pool): bailing on one bad vote hands the round to the attacker. Safer to drop the bad entry and build a valid QC from what's good.

The output QC/TC is identical from peers' perspective in both cases — they verify the same way regardless of how it was constructed. No protocol change.

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

…res and optimize performance, close XFN-03 (#1625)"

This reverts commit 81416e0.
…N-03 (#1643)

* use signer pubkey to check for unique signatures

* add error handling

* add go routines to process Ecrecover concurrently
add input validation and early return

* optimize
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34b55f31-06b0-460e-b47c-5cc8837e06c5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-signature-dedup-construction-4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…check)

* add signature dedup construction check at QC and TC generation process

* add tests

* add checking of signature S value to prevent malleable signature (signature flipping)

* add signature flip tests

* fix test

* fix test

* refactor hasLowS check to aware of signer and improve logs

* fix inaccurate pubkeys > addresses

(fix): fix tests due to change in error logging (#2352)
@wanwiset25 wanwiset25 force-pushed the fix-signature-dedup-construction-4 branch from 6064cbd to a092aca Compare May 21, 2026 17:43
@wanwiset25 wanwiset25 changed the title WIP: apply v2.7.0 hotfix to dev branch WIP: apply v2.7.0 hotfix to dev branch and revert optimization changes May 21, 2026
@wanwiset25 wanwiset25 changed the title WIP: apply v2.7.0 hotfix to dev branch and revert optimization changes fix(consensus): apply v2.7.0 hotfix to dev branch and revert optimization changes May 21, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies the v2.7.0 consensus hotfix onto the dev branch, reverting earlier consensus optimizations, and hardening XDPoS v2 QC/TC handling against signature-amplification and signature malleability (high-S) attacks.

Changes:

  • Add signature verification/dedup + low-S enforcement, and make QC/TC authoring paths tolerant (filter bad/duplicate pool entries instead of bailing).
  • Refactor/move SyncInfo/Vote/Timeout verification/handler entry points into engine.go, and remove the SyncInfo pool plumbing.
  • Update/add tests covering duplicate-signing scenarios, TC/QC verification behavior, and low-S checks.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
eth/bft/bft_handler.go Adjusts BFT handler logging and SyncInfo handling paths.
core/types/consensus_v2.go Removes SyncInfo pool-related helpers no longer needed after syncInfo pool removal.
consensus/XDPoS/utils/pool.go Removes pool inspection logging and associated import.
consensus/XDPoS/utils/errors.go Adds a shared ErrInvalidSignature error for consensus signature validation failures.
consensus/XDPoS/engines/engine_v2/vote.go Makes QC authoring tolerant by verifying/deduping signatures before QC construction.
consensus/XDPoS/engines/engine_v2/utils.go Introduces signer recovery/dedup (RecoverUniqueSigners), pooled verification (verifyAllSignatures), and low-S enforcement (hasLowS).
consensus/XDPoS/engines/engine_v2/utils_test.go Adds unit tests for dedup, low-S checks, and signature verification behavior.
consensus/XDPoS/engines/engine_v2/timeout.go Makes TC authoring tolerant; refactors TC verification (dedup + signature checks) and TC processing.
consensus/XDPoS/engines/engine_v2/testing_utils.go Adds test-only fakers to invoke pool-threshold hooks under the engine lock.
consensus/XDPoS/engines/engine_v2/syncInfo.go Deletes old SyncInfo workflow implementation (moved into engine.go).
consensus/XDPoS/engines/engine_v2/engine.go Reintroduces/relocates SyncInfo/Vote/Timeout workflows; changes QC verification flow and round-transition pool behavior.
consensus/XDPoS/api.go Changes GetLatestPoolStatus response schema to remove SyncInfo pooling data.
consensus/tests/engine_v2_tests/vote_test.go Updates vote/timeout integration test to use real signatures + validates emitted SyncInfo contents.
consensus/tests/engine_v2_tests/verify_header_test.go Updates expected error for duplicated QC signatures.
consensus/tests/engine_v2_tests/timeout_test.go Updates timeout integration test to use real signatures + validates emitted SyncInfo contents.
consensus/tests/engine_v2_tests/sync_info_test.go Updates SyncInfo tests to match new verification/initialization behavior.
consensus/tests/engine_v2_tests/duplicate_signing_test.go Adds new integration tests for duplicate-signing amplification in vote/timeout pools.

Comment thread eth/bft/bft_handler.go
qcBlockNum := syncInfo.HighestQuorumCert.ProposedBlockInfo.Number.Int64()
if dist := qcBlockNum - int64(b.chainHeight()); dist < -maxBlockDist || dist > maxBlockDist {
log.Debug("[SyncInfo] Discarded propagated syncInfo, too far away", "peer", peer, "distance", dist, "hash", syncInfo.Hash().Hex())
log.Debug("[SyncInfo] Discarded propagated syncInfo, too far away", "peer", peer, "blockNum", qcBlockNum, "hash", syncInfo.Hash, "distance", dist)
Comment on lines +663 to +676
if (x.highestQuorumCert.ProposedBlockInfo.Round >= syncInfo.HighestQuorumCert.ProposedBlockInfo.Round) && (x.highestTimeoutCert.Round >= syncInfo.HighestTimeoutCert.Round) {
log.Debug("[VerifySyncInfoMessage] Round from incoming syncInfo message is no longer qualified", "Highest QC Round", x.highestQuorumCert.ProposedBlockInfo.Round, "Incoming SyncInfo QC Round", syncInfo.HighestQuorumCert.ProposedBlockInfo.Round, "highestTimeoutCert Round", x.highestTimeoutCert.Round, "Incoming syncInfo TC Round", syncInfo.HighestTimeoutCert.Round)
return false, nil
}

err := x.verifyQC(chain, syncInfo.HighestQuorumCert, nil)
if err != nil {
log.Warn("[VerifySyncInfoMessage] SyncInfo message verification failed due to QC", "blockNum", syncInfo.HighestQuorumCert.ProposedBlockInfo.Number, "round", syncInfo.HighestQuorumCert.ProposedBlockInfo.Round, "error", err)
return false, err
}
err = x.verifyTC(chain, syncInfo.HighestTimeoutCert)
if err != nil {
log.Warn("[VerifySyncInfoMessage] SyncInfo message verification failed due to TC", "gapNum", syncInfo.HighestTimeoutCert.GapNumber, "round", syncInfo.HighestTimeoutCert.Round, "error", err)
return false, err
if len(duplicates) != 0 {
for _, d := range duplicates {
log.Warn("[verifyQC] duplicated signature in QC", "duplicate", common.Bytes2Hex(d))
}
Comment on lines +83 to +107
func RecoverUniqueSigners(signedHash common.Hash, signatureList []types.Signature) ([]types.Signature, []types.Signature, error) {
if (signedHash == common.Hash{}) {
return nil, nil, errors.New("signedHash cannot be empty")
}
if len(signatureList) == 0 {
return []types.Signature{}, []types.Signature{}, nil
}

type Message struct {
pubkey common.Address
sig types.Signature
}

result := make(chan Message, len(signatureList))
errCh := make(chan error, len(signatureList))
var wg sync.WaitGroup
wg.Add(len(signatureList))
for _, signature := range signatureList {
go func(sig types.Signature) {
defer wg.Done()

pubkey, err := crypto.Ecrecover(signedHash.Bytes(), signature)
if err != nil {
log.Error("[UniqueSignatures] error while recovering public key", "error", err, "signature", common.Bytes2Hex(signature), "signedHash", signedHash.Hex())
errCh <- err
Comment on lines +163 to 197
snap, err := x.getSnapshot(chain, timeoutCert.GapNumber, true)
if err != nil {
return err
log.Error("[verifyTC] Fail to get snapshot when verifying TC!", "tcGapNumber", timeoutCert.GapNumber)
return fmt.Errorf("[verifyTC] Unable to get snapshot, %s", err)
}
if snap == nil || len(snap.NextEpochCandidates) == 0 {
log.Error("[verifyTC] Something wrong with the snapshot from gapNumber", "messageGapNumber", timeoutCert.GapNumber, "snapshot", snap)
return errors.New("empty master node lists from snapshot")
}

signedTimeoutObj := types.TimeoutSigHash(&types.TimeoutForSign{
Round: timeoutCert.Round,
GapNumber: timeoutCert.GapNumber,
})
signatures, duplicates, err := RecoverUniqueSigners(signedTimeoutObj, timeoutCert.Signatures)
if err != nil {
log.Error("[verifyTC] Error while getting unique signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "error", err)
return err
}

if len(duplicates) != 0 {
for _, d := range duplicates {
log.Warn("[verifyQC] duplicated signature in QC", "duplicate", common.Bytes2Hex(d))
}
}

numValidSignatures, err := x.countValidSignatures(signedTimeoutObj, timeoutCert.Signatures, epochInfo.Masternodes)
epochInfo, err := x.getTCEpochInfo(chain, timeoutCert.Round)
if err != nil {
log.Error("[verifyTC] Error while verifying TC message signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "Error", err)
return err
}

certThreshold := x.config.V2.Config(uint64(timeoutCert.Round)).CertThreshold
if float64(numValidSignatures) < float64(epochInfo.MasternodesLen)*certThreshold {
if float64(len(signatures)) < float64(epochInfo.MasternodesLen)*certThreshold {
log.Warn("[verifyTC] Invalid TC Signature is less or empty", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "certThreshold", float64(epochInfo.MasternodesLen)*certThreshold)
return utils.ErrInvalidTCSignatures
Comment on lines +200 to +226
var wg sync.WaitGroup
wg.Add(len(signatures))

var mutex sync.Mutex
var haveError error

for _, signature := range signatures {
go func(sig types.Signature) {
defer wg.Done()
verified, _, err := x.verifyMsgSignature(signedTimeoutObj, sig, snap.NextEpochCandidates)
if err != nil || !verified {
log.Error("[verifyTC] Error or verification failure", "signature", sig, "error", err)
mutex.Lock() // Lock before accessing haveError
if haveError == nil {
if err != nil {
log.Error("[verifyTC] Error while verfying TC message signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(signatures), "error", err)
haveError = fmt.Errorf("error while verifying TC message signatures, %s", err)
} else {
log.Warn("[verifyTC] Signature not verified doing TC verification", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(signatures))
haveError = errors.New("fail to verify TC due to signature mis-match")
}
}
mutex.Unlock() // Unlock after modifying haveError
}
}(signature)
}
wg.Wait()
Comment thread consensus/XDPoS/api.go
Comment on lines 232 to 246
// Get current vote pool and timeout pool content and missing messages
func (api *API) GetLatestPoolStatus() PoolStatus {
func (api *API) GetLatestPoolStatus() MessageStatus {
header := api.chain.CurrentHeader()
masternodes := api.XDPoS.EngineV2.GetMasternodes(api.chain, header)

receivedVotes := api.XDPoS.EngineV2.ReceivedVotes()
receivedTimeouts := api.XDPoS.EngineV2.ReceivedTimeouts()
receivedSyncInfo := api.XDPoS.EngineV2.ReceivedSyncInfo()

info := PoolStatus{}
info.Vote = make(map[string]SignerTypes)
info.Timeout = make(map[string]SignerTypes)
info.SyncInfo = make(map[string]SyncInfoTypes)

calculateSigners(info.Vote, receivedVotes, masternodes)
calculateSigners(info.Timeout, receivedTimeouts, masternodes)
info := make(MessageStatus)
info["vote"] = make(map[string]SignerTypes)
info["timeout"] = make(map[string]SignerTypes)

for name, objs := range receivedSyncInfo {
for _, obj := range objs {
syncInfo := obj.(*types.SyncInfo)
hash := syncInfo.Hash()
key := name + ":" + hash.Hex()

qcSigners := len(syncInfo.HighestQuorumCert.Signatures)
tcSigners := 0
if syncInfo.HighestTimeoutCert != nil {
tcSigners = len(syncInfo.HighestTimeoutCert.Signatures)
}
info.SyncInfo[key] = SyncInfoTypes{
Hash: hash,
QCSigners: qcSigners,
TCSigners: tcSigners,
}
}
}
calculateSigners(info["vote"], receivedVotes, masternodes)
calculateSigners(info["timeout"], receivedTimeouts, masternodes)

return info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants