eth/fetcher: refactor witness_manager to reduce complexity#2188
eth/fetcher: refactor witness_manager to reduce complexity#2188
Conversation
…test gaps Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Runs diffguard (https://github.com/0xPolygon/diffguard) on every PR to report cognitive complexity, function/file sizes, dependency structure, churn-weighted complexity, and mutation testing at 10% sample rate against the base branch. Text report goes to the step summary (visible in the PR check UI) and a JSON artifact is retained for 14 days for offline triage. Uses the 16-core bor runner to keep mutation turnaround bounded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8653471 to
f1d829b
Compare
Code ReviewBug: Inverted condition in
|
The refactor introduced an inverted condition where armTimerChan returned a nil timer channel when pending requests existed, preventing tick() from ever firing through the timer path. The retry loop only appeared to work because rescheduleWitness also pokes pokeCh. Reported in PR #2188 review. Adds TestWitnessLoopDrivesFetchesForPending to guard against regressions — it exercises the full loop→tick→ fetchWitness pipeline via real channels, which the existing TestLoop did not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
test |
|
test2 |
|
test3 |
Code review -- Found 4 issues: 1 critical logic bug in the refactored timer logic, 1 boundary-condition inconsistency, and 2 nil dereference risks in tests. Detailed inline comments will follow. |
Code reviewFound 4 issues: 1 critical logic bug in the refactored timer logic, 1 boundary-condition inconsistency, and 2 nil dereference risks in tests. See detailed comments below. Issue 1 - Critical bug in armTimerChanFile: eth/fetcher/witness_manager.go, line 228 Bug (critical): The condition here is inverted compared to the original logic. The original code activates the timer when pendingCount > 0 and stops it otherwise. The new code does the opposite -- it calls stopAndDrainTimer and returns nil when pendingCount != 0, and activates the timer when nothing is pending. The branches are swapped. The timer is disabled when items are pending and enabled when nothing is pending. This means tick() (which dispatches witness fetch requests) will never fire via the timer when blocks are waiting for witnesses, breaking witness fetching entirely. The doc comment on line 220 describes the correct intended behavior, confirming this is an unintentional inversion. Fix: change the condition to pendingCount == 0. Issue 2 - Boundary inconsistencyFile: eth/fetcher/witness_manager.go, line 281 Bug (boundary inconsistency): This changes < to <=, but block_fetcher.go uses < for the same check (lines 571 and 1087). A block at exactly dist == -maxUncleDist (-7) will be accepted by the block fetcher but rejected by the witness manager, meaning it could be imported without its witness being fetched. If this is an intentional tightening, it should be applied consistently to block_fetcher.go as well. Otherwise, revert to < to match. Issue 3 - Test nil dereference AFile: eth/fetcher/witness_manager_test.go, lines 3230-3232 Bug (nil dereference): When m.pending[hash] is nil, the if body executes and m.pending[hash].op.origin in the Errorf call will panic. This test is specifically designed to catch mutations that remove pending entries, exactly the scenario where the nil dereference occurs. Fix: add a nil guard -- check for nil first with t.Fatal, then check the origin field in an else-if branch. Issue 4 - Test nil dereference BFile: eth/fetcher/witness_manager_test.go, lines 3270-3272 Bug (nil dereference): No nil guard before m.pending[hash].op.origin. If checkCompleting removes the entry (the exact failure scenario this test catches), this panics instead of reporting a clean test failure. Fix: same pattern as Issue 3 -- add a nil guard with t.Fatal before accessing the origin field. |
Detailed code suggestions for the 4 issues foundIssue 1 fix (witness_manager.go:228): Change the condition from pendingCount != 0 to pendingCount == 0. The current code inverts the timer logic -- it disables the timer when items are pending and enables it when nothing is pending, which is backwards from the original loop() behavior and the doc comment on line 220. Issue 2 fix (witness_manager.go:281): Change <= back to < to match block_fetcher.go lines 571 and 1087, or update block_fetcher.go consistently if the tightening is intentional. Issues 3 and 4 fix (witness_manager_test.go:3230-3232 and 3270-3272): Add a nil guard before accessing m.pending[hash].op.origin. Check for nil first with t.Fatal, then check the origin field in an else-if branch. Both locations have the same pattern -- accessing a struct field on a potentially nil map value without checking nil first. |
|
test |
Three issues from PR #2188 code review: - handleNeed's distance check was accidentally tightened from `dist < -maxUncleDist` to `<=` during the refactor, introducing a boundary mismatch with block_fetcher.go (which uses `<`). A block at exactly dist == -maxUncleDist would be accepted by block_fetcher but dropped by witness_manager, potentially importing blocks without their witnesses. Reverted to match block_fetcher. - TestWitnessHandleFilterResultSkipsAlreadyPending dereferenced `m.pending[hash].op.origin` inside the nil-check branch, panicking on the exact mutation it was meant to catch (pending entry removal). Now fails cleanly with t.Fatal when entry is nil. - Same nil deref pattern in TestWitnessCheckCompletingSkipsAlreadyPending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Inline Comment 1 -- Bug: Logic inversion -- timer disabled when items ARE pending The condition is inverted from the original code. The original
Since The function own doc comment confirms the intent: "Returns nil channel (never fires) when nothing is pending" -- but the code returns nil when Suggested fix: if pendingCount == 0 { |
|
Inline Comment 2 -- Inconsistent distance check -- This was changed from If this change is intentional, consider aligning Suggested fix: if dist := int64(number) - int64(m.parentChainHeight()); dist < -maxUncleDist { |
Review SummaryThis PR refactors witness_manager.go to reduce complexity by extracting helper methods. The structure is cleaner, but there are two issues that need to be addressed before merging: 1. CRITICAL: Logic inversion in timerChan() (line 228) 2. Inconsistent distance check in tooOld() (line 281) Requesting changes -- the logic inversion in timerChan() is a correctness bug that would prevent witness fetching from working. |
|
|
@claude Please review again. The previously identified bugs have been addressed and fixed. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (84.56%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2188 +/- ##
===========================================
+ Coverage 51.93% 51.96% +0.02%
===========================================
Files 884 884
Lines 155446 155455 +9
===========================================
+ Hits 80731 80778 +47
+ Misses 69509 69476 -33
+ Partials 5206 5201 -5
... and 17 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
Claude finished @cffls's task in 2m 3s —— View job Re-review:
|
| Issue | Status |
|---|---|
Critical: inverted condition in armTimerChan (!= 0 → == 0) |
Fixed (line 228) |
Boundary inconsistency (<= → < to match block_fetcher.go) |
Fixed (line 284), with a clear explanatory comment |
Test nil deref A (m.pending[hash].op.origin without nil guard) |
Fixed (line 3231–3234: nil check with t.Fatal before field access) |
| Test nil deref B (same pattern) | Fixed (line 3275–3278: same nil-guard pattern) |
All four fixes are correct.
Fresh review of current state
Overall assessment: LGTM. The refactor is well-structured — the extracted helpers have clear ownership of lock scope, the method decomposition follows the original control flow faithfully, and the new tests are precisely targeted at mutation-testing gaps.
One nit (pre-existing, cosmetic)
witness_manager.go line 592 — when the request creation succeeds (err == nil) but the hash is no longer pending, the error message wraps a nil error:
m.handleWitnessFetchFailureExt(hash, "", fmt.Errorf("request initiation failed: %w", err), false)This produces "request initiation failed: <nil>". Not a bug (the removePending=false path just bumps the announce time for backoff), but a slightly confusing error string. This was carried over from the original code, so not a regression — just noting it for a future cleanup pass.
What looks good
- Lock discipline:
collectReadyHashesLockedno longer drops and re-acquires the lock mid-iteration to callmarkWitnessUnavailable— it collects hashes and marks them after releasing the lock. This eliminates a subtle class of bugs where the map could change between unlock/relock. stopAndDrainTimer: Properly extracted and reused innewWitnessManager,armTimerChan, andrescheduleWitness— eliminates three copies of the stop-drain idiom.TestWitnessLoopDrivesFetchesForPending: End-to-end regression test that would have caught thearmTimerChaninversion. Good addition.- Test helpers (
newWitnessManagerForTest,addPendingEntry): Clean, reduce boilerplate across the 22 new tests. - diffguard workflow: Reasonable configuration — 10% mutation sample rate, 5m test timeout, tier-based gating.
No further issues found. The PR is ready to merge.


Description
Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%.
eth/fetcher/witness_manager.go— Refactoring ImpactRefactor + test additions guided by diffguard.
Aggregate metrics
The four worst offenders — before vs after split
tick()— was the worsttick()tick()logPendingStatesAtTickLockedcollectReadyHashesLockedextractPrematureOpsLockedbuildPeerRequestsLockeddispatchPeerRequestsdispatchPeerFetchesloop()loop()loop()armTimerChanlogTimerTickfetchWitness()fetchWitness()fetchWitness()initiateWitnessFetchawaitWitnessResponseprocessWitnessResponserescheduleWitness()rescheduleWitness()rescheduleWitness()earliestPendingAnnounceLockedstopAndDrainTimerTests
New tests target specific mutation-testing gaps (~15) plus regression tests for bugs found during review (the
loop()end-to-end retry test and the fixed nil-deref test guards).Mutation testing
log.Debugremovals, report-only)diffguard section status