eth/fetcher: refactor witness_manager to reduce complexity by cffls · Pull Request #2188 · 0xPolygon/bor

cffls · 2026-04-15T20:12:15Z

Description

Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%.

`eth/fetcher/witness_manager.go` — Refactoring Impact

Refactor + test additions guided by diffguard.

Aggregate metrics

Metric	Before	After	Change
File LOC	889	1,119	+230 (helpers + doc comments)
Function count	19	36	+17 extracted helpers
Max cognitive complexity	44	10	-77%
Mean complexity	7.6	3.9	-49%
Median complexity	4	3	-25%
Functions above threshold (10)	5	0	all cleared
Functions above 50 lines	5	3	-40%

The four worst offenders — before vs after split

`tick()` — was the worst

	Lines	Complexity
Before: `tick()`	130	44
After: `tick()`	~25	2
→ `logPendingStatesAtTickLocked`	14	8
→ `collectReadyHashesLocked`	38	10
→ `extractPrematureOpsLocked`	11	4
→ `buildPeerRequestsLocked`	20	6
→ `dispatchPeerRequests`	8	2
→ `dispatchPeerFetches`	20	6

`loop()`

	Lines	Complexity
Before: `loop()`	84	22
After: `loop()`	52	9
→ `armTimerChan`	20	2
→ `logTimerTick`	5	1

`fetchWitness()`

	Lines	Complexity
Before: `fetchWitness()`	83	15
After: `fetchWitness()`	~15	1
→ `initiateWitnessFetch`	33	7
→ `awaitWitnessResponse`	15	1
→ `processWitnessResponse`	23	3

`rescheduleWitness()`

	Lines	Complexity
Before: `rescheduleWitness()`	~40	14
After: `rescheduleWitness()`	~25	3
→ `earliestPendingAnnounceLocked`	12	7
→ `stopAndDrainTimer`	8	3

Tests

	Before	After	Delta
Test file LOC	1,962	3,447	+1,485
Test function count	39	61	+22

New tests target specific mutation-testing gaps (~15) plus regression tests for bugs found during review (the loop() end-to-end retry test and the fixed nil-deref test guards).

Mutation testing

	Before refactor	After refactor + tests
Raw score	~54.7% (baseline)	96.1%
Tier 1 (logic)	~43% (many survivors)	100.0% (98/98 killed)
Tier 2 (semantic)	~56%	100.0% (22/22)
Tier 3 (observability)	mostly noise	94.4% (remaining 15 are `log.Debug` removals, report-only)
Severity verdict	FAIL	PASS

diffguard section status

Check	Before	After
Cognitive complexity	FAIL (4 violations, max 44)	PASS
Code sizes	FAIL (6 violations)	FAIL (3 function size, 1 file size — structural, pre-existing)
Dependency structure	PASS	PASS
Churn-weighted complexity	WARN (top score 351)	PASS (top score 96)
Mutation testing	FAIL	PASS

…test gaps Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

Runs diffguard (https://github.com/0xPolygon/diffguard) on every PR to report cognitive complexity, function/file sizes, dependency structure, churn-weighted complexity, and mutation testing at 10% sample rate against the base branch. Text report goes to the step summary (visible in the PR check UI) and a JSON artifact is retained for 14 days for offline triage. Uses the 16-core bor runner to keep mutation turnaround bounded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lucca30

lgtm

claude · 2026-04-15T20:28:44Z

Code Review

Bug: Inverted condition in `armTimerChan` -- witness timer is disabled when items are pending

File: eth/fetcher/witness_manager.go lines 223-241

The condition in armTimerChan is inverted relative to the original code it replaces.

Original logic (removed):

if pendingCount > 0 {
    timerChan = m.witnessTimer.C   // timer ACTIVE when pending
} else {
    // stop/drain timer
}

New logic (armTimerChan):

if pendingCount != 0 {
    m.stopAndDrainTimer()
    return nil, lastTick           // timer DISABLED when pending
}
return m.witnessTimer.C, lastTick  // timer ACTIVE when nothing pending

The function doc comment says "Returns nil channel (never fires) when nothing is pending" but the code returns nil when something is pending. This means the timer that drives tick() (which dispatches witness fetch requests) will never fire when there are pending witness requests, breaking the core witness fetch retry loop.

Suggested fix -- change line 228 from if pendingCount != 0 to if pendingCount == 0:

if pendingCount == 0 {
    m.stopAndDrainTimer()
    return nil, lastTick
}

Also checked for CLAUDE.md compliance -- no other issues found.

The refactor introduced an inverted condition where armTimerChan returned a nil timer channel when pending requests existed, preventing tick() from ever firing through the timer path. The retry loop only appeared to work because rescheduleWitness also pokes pokeCh. Reported in PR #2188 review. Adds TestWitnessLoopDrivesFetchesForPending to guard against regressions — it exercises the full loop→tick→ fetchWitness pipeline via real channels, which the existing TestLoop did not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-15T20:32:02Z

test

claude · 2026-04-15T20:32:27Z

test2

claude · 2026-04-15T20:34:09Z

test3

claude · 2026-04-15T20:35:09Z

Code review -- Found 4 issues: 1 critical logic bug in the refactored timer logic, 1 boundary-condition inconsistency, and 2 nil dereference risks in tests. Detailed inline comments will follow.

claude · 2026-04-15T20:35:51Z

Code review

Found 4 issues: 1 critical logic bug in the refactored timer logic, 1 boundary-condition inconsistency, and 2 nil dereference risks in tests. See detailed comments below.

Issue 1 - Critical bug in armTimerChan

File: eth/fetcher/witness_manager.go, line 228

Bug (critical): The condition here is inverted compared to the original logic. The original code activates the timer when pendingCount > 0 and stops it otherwise. The new code does the opposite -- it calls stopAndDrainTimer and returns nil when pendingCount != 0, and activates the timer when nothing is pending.

The branches are swapped. The timer is disabled when items are pending and enabled when nothing is pending. This means tick() (which dispatches witness fetch requests) will never fire via the timer when blocks are waiting for witnesses, breaking witness fetching entirely.

The doc comment on line 220 describes the correct intended behavior, confirming this is an unintentional inversion. Fix: change the condition to pendingCount == 0.

Issue 2 - Boundary inconsistency

File: eth/fetcher/witness_manager.go, line 281

Bug (boundary inconsistency): This changes < to <=, but block_fetcher.go uses < for the same check (lines 571 and 1087). A block at exactly dist == -maxUncleDist (-7) will be accepted by the block fetcher but rejected by the witness manager, meaning it could be imported without its witness being fetched.

If this is an intentional tightening, it should be applied consistently to block_fetcher.go as well. Otherwise, revert to < to match.

Issue 3 - Test nil dereference A

File: eth/fetcher/witness_manager_test.go, lines 3230-3232

Bug (nil dereference): When m.pending[hash] is nil, the if body executes and m.pending[hash].op.origin in the Errorf call will panic. This test is specifically designed to catch mutations that remove pending entries, exactly the scenario where the nil dereference occurs.

Fix: add a nil guard -- check for nil first with t.Fatal, then check the origin field in an else-if branch.

Issue 4 - Test nil dereference B

File: eth/fetcher/witness_manager_test.go, lines 3270-3272

Bug (nil dereference): No nil guard before m.pending[hash].op.origin. If checkCompleting removes the entry (the exact failure scenario this test catches), this panics instead of reporting a clean test failure.

Fix: same pattern as Issue 3 -- add a nil guard with t.Fatal before accessing the origin field.

claude · 2026-04-15T20:36:22Z

Detailed code suggestions for the 4 issues found

Issue 1 fix (witness_manager.go:228): Change the condition from pendingCount != 0 to pendingCount == 0. The current code inverts the timer logic -- it disables the timer when items are pending and enables it when nothing is pending, which is backwards from the original loop() behavior and the doc comment on line 220.

Issue 2 fix (witness_manager.go:281): Change <= back to < to match block_fetcher.go lines 571 and 1087, or update block_fetcher.go consistently if the tightening is intentional.

Issues 3 and 4 fix (witness_manager_test.go:3230-3232 and 3270-3272): Add a nil guard before accessing m.pending[hash].op.origin. Check for nil first with t.Fatal, then check the origin field in an else-if branch. Both locations have the same pattern -- accessing a struct field on a potentially nil map value without checking nil first.

claude · 2026-04-15T20:38:36Z

test

Three issues from PR #2188 code review: - handleNeed's distance check was accidentally tightened from `dist < -maxUncleDist` to `<=` during the refactor, introducing a boundary mismatch with block_fetcher.go (which uses `<`). A block at exactly dist == -maxUncleDist would be accepted by block_fetcher but dropped by witness_manager, potentially importing blocks without their witnesses. Reverted to match block_fetcher. - TestWitnessHandleFilterResultSkipsAlreadyPending dereferenced `m.pending[hash].op.origin` inside the nil-check branch, panicking on the exact mutation it was meant to catch (pending entry removal). Now fails cleanly with t.Fatal when entry is nil. - Same nil deref pattern in TestWitnessCheckCompletingSkipsAlreadyPending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-15T20:39:02Z

Inline Comment 1 -- eth/fetcher/witness_manager.go line 228

Bug: Logic inversion -- timer disabled when items ARE pending

The condition is inverted from the original code. The original loop() armed the timer (timerChan = m.witnessTimer.C) when pendingCount > 0 and stopped it when nothing was pending. This refactored version does the opposite:

pendingCount != 0 (items ARE pending) -> stops timer, returns nil channel
pendingCount == 0 (nothing pending) -> returns live timer channel

Since tick() is only called from case <-timerChan: in the select loop, a nil channel means tick() never fires when there are pending witness requests -- silently breaking all timer-driven witness fetching.

The function own doc comment confirms the intent: "Returns nil channel (never fires) when nothing is pending" -- but the code returns nil when pendingCount != 0.

Suggested fix:

if pendingCount == 0 {

claude · 2026-04-15T20:39:08Z

Inline Comment 2 -- eth/fetcher/witness_manager.go line 281

Inconsistent distance check -- block_fetcher.go uses strict <

This was changed from < to <=, but block_fetcher.go uses dist < -maxUncleDist in both its distance checks (line 571, line 1087). With <=, a block at exactly maxUncleDist (7) blocks behind the chain head is now rejected by the witness manager while being accepted by the block fetcher -- creating a gap where a block is queued for import but its witness is dropped.

If this change is intentional, consider aligning block_fetcher.go as well and adding a test for the boundary case.

Suggested fix:

if dist := int64(number) - int64(m.parentChainHeight()); dist < -maxUncleDist {

claude · 2026-04-15T20:39:22Z

Review Summary

This PR refactors witness_manager.go to reduce complexity by extracting helper methods. The structure is cleaner, but there are two issues that need to be addressed before merging:

1. CRITICAL: Logic inversion in timerChan() (line 228)
The condition pendingCount != 0 is inverted -- it disables the timer when items ARE pending and enables it when nothing is pending. This silently breaks all timer-driven witness fetching. The fix is to change != to ==.

2. Inconsistent distance check in tooOld() (line 281)
The operator was changed from < to <=, creating an off-by-one inconsistency with block_fetcher.go which still uses strict <. A block at exactly maxUncleDist behind the chain head would be accepted by the block fetcher but have its witness dropped by the witness manager.

Requesting changes -- the logic inversion in timerChan() is a correctness bug that would prevent witness fetching from working.

sonarqubecloud · 2026-04-15T20:54:21Z

Quality Gate failed

Failed conditions
11.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

cffls · 2026-04-15T21:13:09Z

@claude Please review again. The previously identified bugs have been addressed and fixed.

codecov · 2026-04-15T21:13:19Z

Codecov Report

❌ Patch coverage is 84.56790% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.96%. Comparing base (438596d) to head (d855a08).

Files with missing lines	Patch %	Lines
eth/fetcher/witness_manager.go	84.56%	18 Missing and 7 partials ⚠️

❌ Your patch check has failed because the patch coverage (84.56%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2188      +/-   ##
===========================================
+ Coverage    51.93%   51.96%   +0.02%     
===========================================
  Files          884      884              
  Lines       155446   155455       +9     
===========================================
+ Hits         80731    80778      +47     
+ Misses       69509    69476      -33     
+ Partials      5206     5201       -5

Files with missing lines	Coverage Δ
eth/fetcher/witness_manager.go	`88.62% <84.56%> (+5.09%)`	⬆️

... and 17 files with indirect coverage changes

Files with missing lines	Coverage Δ
eth/fetcher/witness_manager.go	`88.62% <84.56%> (+5.09%)`	⬆️

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude · 2026-04-15T21:13:30Z

Claude finished @cffls's task in 2m 3s —— View job

Re-review: `witness_manager` refactor

Verify previously identified bugs are fixed
Read full diff against origin/develop
Review witness_manager.go changes
Review witness_manager_test.go changes
Review diffguard.yml workflow
Post final review

Previously identified issues — all resolved

Issue	Status
Critical: inverted condition in `armTimerChan` (`!= 0` → `== 0`)	Fixed (line 228)
Boundary inconsistency (`<=` → `<` to match `block_fetcher.go`)	Fixed (line 284), with a clear explanatory comment
Test nil deref A (`m.pending[hash].op.origin` without nil guard)	Fixed (line 3231–3234: nil check with `t.Fatal` before field access)
Test nil deref B (same pattern)	Fixed (line 3275–3278: same nil-guard pattern)

All four fixes are correct.

Fresh review of current state

Overall assessment: LGTM. The refactor is well-structured — the extracted helpers have clear ownership of lock scope, the method decomposition follows the original control flow faithfully, and the new tests are precisely targeted at mutation-testing gaps.

One nit (pre-existing, cosmetic)

witness_manager.go line 592 — when the request creation succeeds (err == nil) but the hash is no longer pending, the error message wraps a nil error:

m.handleWitnessFetchFailureExt(hash, "", fmt.Errorf("request initiation failed: %w", err), false)

This produces "request initiation failed: <nil>". Not a bug (the removePending=false path just bumps the announce time for backoff), but a slightly confusing error string. This was carried over from the original code, so not a regression — just noting it for a future cleanup pass.

What looks good

Lock discipline: collectReadyHashesLocked no longer drops and re-acquires the lock mid-iteration to call markWitnessUnavailable — it collects hashes and marks them after releasing the lock. This eliminates a subtle class of bugs where the map could change between unlock/relock.
stopAndDrainTimer: Properly extracted and reused in newWitnessManager, armTimerChan, and rescheduleWitness — eliminates three copies of the stop-drain idiom.
TestWitnessLoopDrivesFetchesForPending: End-to-end regression test that would have caught the armTimerChan inversion. Good addition.
Test helpers (newWitnessManagerForTest, addPendingEntry): Clean, reduce boilerplate across the 22 new tests.
diffguard workflow: Reasonable configuration — 10% mutation sample rate, 5m test timeout, tier-based gating.

No further issues found. The PR is ready to merge.

claude bot reviewed Apr 15, 2026

View reviewed changes

lucca30 previously approved these changes Apr 15, 2026

View reviewed changes

cffls dismissed lucca30’s stale review via f1d829b April 15, 2026 20:20

cffls force-pushed the witness_manager_refactor branch from 8653471 to f1d829b Compare April 15, 2026 20:20

Conversation

cffls commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

eth/fetcher/witness_manager.go — Refactoring Impact

Aggregate metrics

The four worst offenders — before vs after split

tick() — was the worst

loop()

fetchWitness()

rescheduleWitness()

Tests

Mutation testing

diffguard section status

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

lucca30 left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bug: Inverted condition in armTimerChan -- witness timer is disabled when items are pending

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Code review -- Found 4 issues: 1 critical logic bug in the refactored timer logic, 1 boundary-condition inconsistency, and 2 nil dereference risks in tests. Detailed inline comments will follow.

Uh oh!

claude bot commented Apr 15, 2026

Code review

Issue 1 - Critical bug in armTimerChan

Issue 2 - Boundary inconsistency

Issue 3 - Test nil dereference A

Issue 4 - Test nil dereference B

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detailed code suggestions for the 4 issues found

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026

Review Summary

Uh oh!

sonarqubecloud bot commented Apr 15, 2026

Quality Gate failed

Uh oh!

cffls commented Apr 15, 2026

Uh oh!

codecov bot commented Apr 15, 2026

Codecov Report

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: witness_manager refactor

Previously identified issues — all resolved

Fresh review of current state

One nit (pre-existing, cosmetic)

What looks good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cffls commented Apr 15, 2026 •

edited

Loading

`eth/fetcher/witness_manager.go` — Refactoring Impact

`tick()` — was the worst

`loop()`

`fetchWitness()`

`rescheduleWitness()`

claude bot commented Apr 15, 2026 •

edited

Loading

Bug: Inverted condition in `armTimerChan` -- witness timer is disabled when items are pending

claude bot commented Apr 15, 2026 •

edited

Loading

claude bot commented Apr 15, 2026 •

edited

Loading

Re-review: `witness_manager` refactor