fix(consensus): apply v2.7.0 hotfix to dev branch and revert optimization changes#2357
fix(consensus): apply v2.7.0 hotfix to dev branch and revert optimization changes#2357wanwiset25 wants to merge 6 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…check) * add signature dedup construction check at QC and TC generation process * add tests * add checking of signature S value to prevent malleable signature (signature flipping) * add signature flip tests * fix test * fix test * refactor hasLowS check to aware of signer and improve logs * fix inaccurate pubkeys > addresses (fix): fix tests due to change in error logging (#2352)
6064cbd to
a092aca
Compare
There was a problem hiding this comment.
Pull request overview
This PR applies the v2.7.0 consensus hotfix onto the dev branch, reverting earlier consensus optimizations, and hardening XDPoS v2 QC/TC handling against signature-amplification and signature malleability (high-S) attacks.
Changes:
- Add signature verification/dedup + low-S enforcement, and make QC/TC authoring paths tolerant (filter bad/duplicate pool entries instead of bailing).
- Refactor/move SyncInfo/Vote/Timeout verification/handler entry points into
engine.go, and remove the SyncInfo pool plumbing. - Update/add tests covering duplicate-signing scenarios, TC/QC verification behavior, and low-S checks.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| eth/bft/bft_handler.go | Adjusts BFT handler logging and SyncInfo handling paths. |
| core/types/consensus_v2.go | Removes SyncInfo pool-related helpers no longer needed after syncInfo pool removal. |
| consensus/XDPoS/utils/pool.go | Removes pool inspection logging and associated import. |
| consensus/XDPoS/utils/errors.go | Adds a shared ErrInvalidSignature error for consensus signature validation failures. |
| consensus/XDPoS/engines/engine_v2/vote.go | Makes QC authoring tolerant by verifying/deduping signatures before QC construction. |
| consensus/XDPoS/engines/engine_v2/utils.go | Introduces signer recovery/dedup (RecoverUniqueSigners), pooled verification (verifyAllSignatures), and low-S enforcement (hasLowS). |
| consensus/XDPoS/engines/engine_v2/utils_test.go | Adds unit tests for dedup, low-S checks, and signature verification behavior. |
| consensus/XDPoS/engines/engine_v2/timeout.go | Makes TC authoring tolerant; refactors TC verification (dedup + signature checks) and TC processing. |
| consensus/XDPoS/engines/engine_v2/testing_utils.go | Adds test-only fakers to invoke pool-threshold hooks under the engine lock. |
| consensus/XDPoS/engines/engine_v2/syncInfo.go | Deletes old SyncInfo workflow implementation (moved into engine.go). |
| consensus/XDPoS/engines/engine_v2/engine.go | Reintroduces/relocates SyncInfo/Vote/Timeout workflows; changes QC verification flow and round-transition pool behavior. |
| consensus/XDPoS/api.go | Changes GetLatestPoolStatus response schema to remove SyncInfo pooling data. |
| consensus/tests/engine_v2_tests/vote_test.go | Updates vote/timeout integration test to use real signatures + validates emitted SyncInfo contents. |
| consensus/tests/engine_v2_tests/verify_header_test.go | Updates expected error for duplicated QC signatures. |
| consensus/tests/engine_v2_tests/timeout_test.go | Updates timeout integration test to use real signatures + validates emitted SyncInfo contents. |
| consensus/tests/engine_v2_tests/sync_info_test.go | Updates SyncInfo tests to match new verification/initialization behavior. |
| consensus/tests/engine_v2_tests/duplicate_signing_test.go | Adds new integration tests for duplicate-signing amplification in vote/timeout pools. |
| qcBlockNum := syncInfo.HighestQuorumCert.ProposedBlockInfo.Number.Int64() | ||
| if dist := qcBlockNum - int64(b.chainHeight()); dist < -maxBlockDist || dist > maxBlockDist { | ||
| log.Debug("[SyncInfo] Discarded propagated syncInfo, too far away", "peer", peer, "distance", dist, "hash", syncInfo.Hash().Hex()) | ||
| log.Debug("[SyncInfo] Discarded propagated syncInfo, too far away", "peer", peer, "blockNum", qcBlockNum, "hash", syncInfo.Hash, "distance", dist) |
| if (x.highestQuorumCert.ProposedBlockInfo.Round >= syncInfo.HighestQuorumCert.ProposedBlockInfo.Round) && (x.highestTimeoutCert.Round >= syncInfo.HighestTimeoutCert.Round) { | ||
| log.Debug("[VerifySyncInfoMessage] Round from incoming syncInfo message is no longer qualified", "Highest QC Round", x.highestQuorumCert.ProposedBlockInfo.Round, "Incoming SyncInfo QC Round", syncInfo.HighestQuorumCert.ProposedBlockInfo.Round, "highestTimeoutCert Round", x.highestTimeoutCert.Round, "Incoming syncInfo TC Round", syncInfo.HighestTimeoutCert.Round) | ||
| return false, nil | ||
| } | ||
|
|
||
| err := x.verifyQC(chain, syncInfo.HighestQuorumCert, nil) | ||
| if err != nil { | ||
| log.Warn("[VerifySyncInfoMessage] SyncInfo message verification failed due to QC", "blockNum", syncInfo.HighestQuorumCert.ProposedBlockInfo.Number, "round", syncInfo.HighestQuorumCert.ProposedBlockInfo.Round, "error", err) | ||
| return false, err | ||
| } | ||
| err = x.verifyTC(chain, syncInfo.HighestTimeoutCert) | ||
| if err != nil { | ||
| log.Warn("[VerifySyncInfoMessage] SyncInfo message verification failed due to TC", "gapNum", syncInfo.HighestTimeoutCert.GapNumber, "round", syncInfo.HighestTimeoutCert.Round, "error", err) | ||
| return false, err |
| if len(duplicates) != 0 { | ||
| for _, d := range duplicates { | ||
| log.Warn("[verifyQC] duplicated signature in QC", "duplicate", common.Bytes2Hex(d)) | ||
| } |
| func RecoverUniqueSigners(signedHash common.Hash, signatureList []types.Signature) ([]types.Signature, []types.Signature, error) { | ||
| if (signedHash == common.Hash{}) { | ||
| return nil, nil, errors.New("signedHash cannot be empty") | ||
| } | ||
| if len(signatureList) == 0 { | ||
| return []types.Signature{}, []types.Signature{}, nil | ||
| } | ||
|
|
||
| type Message struct { | ||
| pubkey common.Address | ||
| sig types.Signature | ||
| } | ||
|
|
||
| result := make(chan Message, len(signatureList)) | ||
| errCh := make(chan error, len(signatureList)) | ||
| var wg sync.WaitGroup | ||
| wg.Add(len(signatureList)) | ||
| for _, signature := range signatureList { | ||
| go func(sig types.Signature) { | ||
| defer wg.Done() | ||
|
|
||
| pubkey, err := crypto.Ecrecover(signedHash.Bytes(), signature) | ||
| if err != nil { | ||
| log.Error("[UniqueSignatures] error while recovering public key", "error", err, "signature", common.Bytes2Hex(signature), "signedHash", signedHash.Hex()) | ||
| errCh <- err |
| snap, err := x.getSnapshot(chain, timeoutCert.GapNumber, true) | ||
| if err != nil { | ||
| return err | ||
| log.Error("[verifyTC] Fail to get snapshot when verifying TC!", "tcGapNumber", timeoutCert.GapNumber) | ||
| return fmt.Errorf("[verifyTC] Unable to get snapshot, %s", err) | ||
| } | ||
| if snap == nil || len(snap.NextEpochCandidates) == 0 { | ||
| log.Error("[verifyTC] Something wrong with the snapshot from gapNumber", "messageGapNumber", timeoutCert.GapNumber, "snapshot", snap) | ||
| return errors.New("empty master node lists from snapshot") | ||
| } | ||
|
|
||
| signedTimeoutObj := types.TimeoutSigHash(&types.TimeoutForSign{ | ||
| Round: timeoutCert.Round, | ||
| GapNumber: timeoutCert.GapNumber, | ||
| }) | ||
| signatures, duplicates, err := RecoverUniqueSigners(signedTimeoutObj, timeoutCert.Signatures) | ||
| if err != nil { | ||
| log.Error("[verifyTC] Error while getting unique signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "error", err) | ||
| return err | ||
| } | ||
|
|
||
| if len(duplicates) != 0 { | ||
| for _, d := range duplicates { | ||
| log.Warn("[verifyQC] duplicated signature in QC", "duplicate", common.Bytes2Hex(d)) | ||
| } | ||
| } | ||
|
|
||
| numValidSignatures, err := x.countValidSignatures(signedTimeoutObj, timeoutCert.Signatures, epochInfo.Masternodes) | ||
| epochInfo, err := x.getTCEpochInfo(chain, timeoutCert.Round) | ||
| if err != nil { | ||
| log.Error("[verifyTC] Error while verifying TC message signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "Error", err) | ||
| return err | ||
| } | ||
|
|
||
| certThreshold := x.config.V2.Config(uint64(timeoutCert.Round)).CertThreshold | ||
| if float64(numValidSignatures) < float64(epochInfo.MasternodesLen)*certThreshold { | ||
| if float64(len(signatures)) < float64(epochInfo.MasternodesLen)*certThreshold { | ||
| log.Warn("[verifyTC] Invalid TC Signature is less or empty", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(timeoutCert.Signatures), "certThreshold", float64(epochInfo.MasternodesLen)*certThreshold) | ||
| return utils.ErrInvalidTCSignatures |
| var wg sync.WaitGroup | ||
| wg.Add(len(signatures)) | ||
|
|
||
| var mutex sync.Mutex | ||
| var haveError error | ||
|
|
||
| for _, signature := range signatures { | ||
| go func(sig types.Signature) { | ||
| defer wg.Done() | ||
| verified, _, err := x.verifyMsgSignature(signedTimeoutObj, sig, snap.NextEpochCandidates) | ||
| if err != nil || !verified { | ||
| log.Error("[verifyTC] Error or verification failure", "signature", sig, "error", err) | ||
| mutex.Lock() // Lock before accessing haveError | ||
| if haveError == nil { | ||
| if err != nil { | ||
| log.Error("[verifyTC] Error while verfying TC message signatures", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(signatures), "error", err) | ||
| haveError = fmt.Errorf("error while verifying TC message signatures, %s", err) | ||
| } else { | ||
| log.Warn("[verifyTC] Signature not verified doing TC verification", "tcRound", timeoutCert.Round, "tcGapNumber", timeoutCert.GapNumber, "tcSignLen", len(signatures)) | ||
| haveError = errors.New("fail to verify TC due to signature mis-match") | ||
| } | ||
| } | ||
| mutex.Unlock() // Unlock after modifying haveError | ||
| } | ||
| }(signature) | ||
| } | ||
| wg.Wait() |
| // Get current vote pool and timeout pool content and missing messages | ||
| func (api *API) GetLatestPoolStatus() PoolStatus { | ||
| func (api *API) GetLatestPoolStatus() MessageStatus { | ||
| header := api.chain.CurrentHeader() | ||
| masternodes := api.XDPoS.EngineV2.GetMasternodes(api.chain, header) | ||
|
|
||
| receivedVotes := api.XDPoS.EngineV2.ReceivedVotes() | ||
| receivedTimeouts := api.XDPoS.EngineV2.ReceivedTimeouts() | ||
| receivedSyncInfo := api.XDPoS.EngineV2.ReceivedSyncInfo() | ||
|
|
||
| info := PoolStatus{} | ||
| info.Vote = make(map[string]SignerTypes) | ||
| info.Timeout = make(map[string]SignerTypes) | ||
| info.SyncInfo = make(map[string]SyncInfoTypes) | ||
|
|
||
| calculateSigners(info.Vote, receivedVotes, masternodes) | ||
| calculateSigners(info.Timeout, receivedTimeouts, masternodes) | ||
| info := make(MessageStatus) | ||
| info["vote"] = make(map[string]SignerTypes) | ||
| info["timeout"] = make(map[string]SignerTypes) | ||
|
|
||
| for name, objs := range receivedSyncInfo { | ||
| for _, obj := range objs { | ||
| syncInfo := obj.(*types.SyncInfo) | ||
| hash := syncInfo.Hash() | ||
| key := name + ":" + hash.Hex() | ||
|
|
||
| qcSigners := len(syncInfo.HighestQuorumCert.Signatures) | ||
| tcSigners := 0 | ||
| if syncInfo.HighestTimeoutCert != nil { | ||
| tcSigners = len(syncInfo.HighestTimeoutCert.Signatures) | ||
| } | ||
| info.SyncInfo[key] = SyncInfoTypes{ | ||
| Hash: hash, | ||
| QCSigners: qcSigners, | ||
| TCSigners: tcSigners, | ||
| } | ||
| } | ||
| } | ||
| calculateSigners(info["vote"], receivedVotes, masternodes) | ||
| calculateSigners(info["timeout"], receivedTimeouts, masternodes) | ||
|
|
||
| return info |
Proposed changes
- Reverted #1625
- Reverted #1236
- Applied #1643 (to align code with v2.7.0)
- fix(consensus): signature dedup construction check
XDPoS v2 had a liveness vulnerability in the QC/TC authoring path where a single byzantine masternode could amplify their own valid signature into multiple pool entries — ECDSA non-determinism (custom random nonce) lets the same signer produce arbitrarily many byte-distinct valid signatures for the same vote/timeout message. These would inflate the pool past threshold prematurely, and the strict authoring path (
if count != len(validSignatures) → skip QC) would then bail out, costing the leader a round. Repeated across rotating attackers, this could degrade chain liveness without breaking safety.This PR adds signature verification (
verifyAllSignatures) used by every QC/TC verification site in the engine, and switches the authoring paths (onVotePoolThresholdReached,onTimeoutPoolThresholdReached) from strict-bail to lenient-filter — duplicate or invalid signatures are dropped, and a clean QC/TC is built from the unique verified subset if threshold is still met.In addition,
hasLowSchecks are added to record then discard all mallead consensus signatures.hasLowSreports whether the signature's S value is in the lower half of the secp256k1 curve order, the canonical low-S form that prevents (r, s, v) -> (r, N-s, v') malleability aka signature-flipping. (low-s is considered valid, high-s is considered mallead signature)Why lenient at authoring, strict at verification
The output QC/TC is identical from peers' perspective in both cases — they verify the same way regardless of how it was constructed. No protocol change.
Types of changes
What types of changes does your code introduce to XDC network?
Put an
✅in the boxes that applyImpacted Components
Which parts of the codebase does this PR touch?
Put an
✅in the boxes that applyChecklist
Put an
✅in the boxes once you have confirmed below actions (or provide reasons on not doing so) that