Skip to content

core, blockstm, state: add BlockSTM v2 parallel transaction execution#2210

Open
cffls wants to merge 32 commits into
developfrom
blockstm_redesign
Open

core, blockstm, state: add BlockSTM v2 parallel transaction execution#2210
cffls wants to merge 32 commits into
developfrom
blockstm_redesign

Conversation

@cffls
Copy link
Copy Markdown
Contributor

@cffls cffls commented May 5, 2026

BlockSTM v2

Introduces BlockSTM v2 — a from-scratch redesign of Bor's parallel
transaction execution engine. V2 speculatively executes block transactions
concurrently, validates each tx's reads against a multi-version store, and
re-executes any whose reads turned stale. On the 241-block mainnet witness
benchmark V2/4w delivers ~1.6× throughput over serial (570 mgas/s vs
350 mgas/s, AMD Ryzen 7 5800H, all-in-memory).

Architecture

V2 runs three coordinated goroutine groups around a per-tx PDB:

  V2StateProcessor.Process    (core/parallel_state_processor.go)
        │
  ExecuteV2BlockSTM           (core/blockstm/v2_executor.go)
        │
  ┌─────┼──────────────┬──────────────┐
  │     │              │              │
  Workers (N)     Validator (1)   Settlement (1)
  ParallelStateDB StoreReads      finalDB
                  BalReads        IntermediateRoot

Backed by:

  • SafeBase — thread-safe base reads (sync.Map caches over a bounded
    pool of StateDB.Copy() with concurrent-reads mode on the trieReader).
  • MVStore — sharded multi-version per-key store with a lock-free bloom
    filter for cold-key reads.
  • MVBalanceStore — sharded commutative balance delta store (per-tx
    Add/Sub; reads sum prior entries).

Execution flow

  1. Task building. Block transactions become V2Tasks. Same-sender
    chains get pre-computed nonces (SenderNonces) so nonce reads on a
    chain are skipped during validation.
  2. Parallel execution. N worker goroutines pull tasks from a buffered
    dispatcher (window numWorkers * InFlightTaskMultiplier). Each tx runs
    in its own ParallelStateDB; reads come from SafeBase + MVStore +
    MVBalanceStore and are recorded in StoreReads / BalReads. Writes
    accumulate locally (DeferMVWrites) and flush to MVStore at end-of-tx
    so concurrent readers only ever see FINAL values — never mid-tx
    reentrancy-guard writes.
  3. Sequential validation. A single goroutine validates txs in tx-index
    order. Each recorded read is re-checked against MVStore; match by
    writer/incarnation OR by value-equal fallback (handles idempotent writes
    such as reentrancy-guard SSTOREs that flip back). Mismatch →
    MarkEstimate the failed tx's writes and dispatch a re-execution
    goroutine. Per-key pipelining: readers that hit an ESTIMATE entry
    under Incarnation > 0 block on WaitForFinal until the upstream
    writer is finalized.
  4. Pipelined settlement. As txs finalize, a settlement goroutine drains
    chSettle in tx-index order and applies each tx's writes to finalDB
    (the real, single-threaded *state.StateDB) through a *Direct setter
    family that bypasses the journal, then asks finalDB for the
    IntermediateRoot.

Testing

V2 is gated on a layered test surface. From cheapest to most expensive,
and what each layer is meant to catch:

1. Compile-time conformance + drift detection

The PDB shadows StateDB's interface and behaviour, so any upstream
go-ethereum merge that adds or changes a StateDB method would silently
bypass V2. A handful of go test-time checks fail CI before any logic
runs:

Check What it pins
core/vm/statedb_impl_test.go PDB satisfies vm.StateDB via a static assertion
TestPDBMethodParity every StateDB method has a PDB mapping or is in pdbExemptMethods
TestV2DependencyCompileCheck every StateDB method V2 settle calls remains present
TestV2JournalEntryCoverage every journal entry kind has a parallelJournalEntry mapping
TestV2TracingHookParity every tracing.Hooks field is classified as fired-or-skipped
TestV2ForkParity every params.ChainConfig.IsX fork rule is classified V1/V2

2. Per-method unit tests (~210 tests across ~25 files)

Cover individual PDB getters/setters, MVStore / MVBalanceStore
primitives, V2 executor channel mesh, and SettleTo helpers. Highlights:

File Coverage
core/state/parallel_statedb_test.go 76 tests; PDB behaviour + the Tier-1 mutation kill suite (see layer 5)
core/state/parallel_statedb_coverage_test.go 42 tests; branch coverage
core/state/parallel_statedb_getter_table_test.go every PDB getter records its read with the right WriterIdx across Committed / ESTIMATE / NoEntry / AtTxZero
core/state/safe_base_test.go sync.Map cache + pool semantics
core/blockstm/mvstore_test.go, core/blockstm/mvbalance_store_test.go versioned store primitives
core/blockstm/v2_executor_wait_test.go waitForTx / waitForFinal + cancellation

3. Direct-setter parity tests

The *Direct setter family bypasses StateDB's journal at settle time.
core/state/v2_direct_setter_parity_test.go (7 tests) pins that
SetXDirect produces a byte-identical state root to journaled
SetX + Finalise. Catches divergence the moment a future change to either
path breaks the parity.

4. Differential tests vs serial StateDB

Hand-written + table-driven scenarios that exercise the PDB against a
parallel-mirror serial StateDB and assert byte-identical output. Catches
behaviour drift the parity-table tests can't express:

  • core/state/v2_differential_test.go — PDB-only diff
  • core/state/v2_executor_differential_test.go — synthetic-env executor diff
  • core/v1_differential_test.go — V1 vs serial parity for the legacy
    in-tree path

5. Mutation testing (Tier-1 kill tests)

diffguard runs mutation testing against V2's critical paths. Every
survivor flagged by a sample run has a corresponding targeted test inline
in core/state/parallel_statedb_test.go under the "Tier-1 mutation kill
tests" divider — boundary, negation, and return-value mutations on
storeReadMatches, journal revert, settleTo helpers, applyFeeData,
Reset, etc. Tier-1 logic kill-rate ≥99% on the latest run.

6. Fuzz targets

Randomized inputs against either a serial mirror or a hand-built reference:

  • core/state/v2_fuzz_test.go — random PDB op sequences vs StateDB
  • core/state/v2_executor_fuzz_test.go — executor-level fuzz on
    synthetic env
  • core/v2_serial_parity_fuzz_test.go (FuzzV2ExecutorVsSerial) — random
    tx batches through ExecuteV2BlockSTM vs an ApplyMessage loop

The race-detected fuzz under -race caught the shared-trie-reader race
that the non-race fuzz missed; worth keeping on the nightly.

7. End-to-end consistency + benchmark on real mainnet blocks

core/mainnet_witness_benchmark_test.go bundles 241 real Polygon mainnet
blocks (under core/blockstm/testdata/) with their pre-block witnesses.
Two harnesses share the corpus:

  • TestV2BlockSTMAllBlocks (gated on BOR_BLOCKSTM_TEST=1) replays each
    block through both serial and V2 and asserts byte-identical state roots
    and receipt roots.
  • BenchmarkV2AllBlocks runs serial + V2 across worker counts (4 / 8 / 16)
    and witness-on/off variants on the same corpus. Backs the throughput
    numbers referenced at the top of this commit.

8. Runtime invariants under -tags=invariants

Build-tag-gated runtime assertions inside the executor and the PDB. Off in
production builds (zero-cost), on in CI:

  • assertSettleOrder — validation walk's induction
  • assertReexecVisitedExactlyOnce — drain loop doesn't lose a tx
  • assertSettleNotPanicked — panicked PDBs never settle

A tiny set of "panic if invariant breaks" tests under
//go:build invariants verifies the assertions actually fire on crafted
violations (core/blockstm/v2_executor_invariants_panic_test.go,
core/state/parallel_statedb_invariants_panic_test.go).

9. Race detector

All of layers 2–8 are runnable under go test -race. CI runs the full
state + blockstm packages in race mode; the TestV2BlockSTMAllBlocks
gated test is also race-clean on the 241-block corpus.

10. Production soak — >1 million Polygon mainnet blocks

Beyond the unit / parity / fuzz layers above, this branch has been used to
sync more than 1,000,000 mainnet blocks end-to-end on a real node with V2
as the primary processor (with serial disabled). Zero state-root
divergences, zero panics requiring fallback, no consensus-affecting issues
observed. This is the most stringent layer: real on-chain workload, real
database backend, real prefetcher contention.

Bundled additions

  • intermediateRootTimer metric (chain/intermediateroot) — measures the
    post-execution trie computation in block_validator.go.

Major changes

The code surface is ~5.1k lines across 39 production .go files, plus
~11.7k lines across 37 test files. The remaining 484 file entries in the
diff are block + witness fixtures under core/blockstm/testdata used by
TestV2BlockSTMAllBlocks and the benchmark harness — read-only data, no
review needed.

Shapes of change a reviewer should expect:

  • New per-tx state. ParallelStateDB shadows *state.StateDB but
    reads from SafeBase + MVStore + MVBalanceStore and tracks reads for
    validation. Implements vm.StateDB. Has its own journal layer
    (parallelJournalEntry) parallel to StateDB's journal.go.
  • New concurrent stores. MVStore (sharded multi-version per-key store
    with bloom filter) and MVBalanceStore (sharded commutative balance
    deltas) — both new, both load-bearing.
  • New executor. ExecuteV2BlockSTM owns the worker pool + in-order
    validator + pipelined settle goroutine and the chSettle /
    completionCh / execDone channel mesh between them.
  • Concurrent-safe base reads. SafeBase is a thread-safe wrapper
    around a *state.StateDB with sync.Map caches + a bounded pool of
    db.Copy() instances; the pool copies share the underlying reader, so
    the V2 entry point flips trieReader into its concurrent-reads mode
    (sync.Map node-resolve cache instead of in-place mutation) — this
    required surgery in state/database.go, state/reader.go,
    state/trie_prefetcher.go, trie/trie.go, trie/secure_trie.go,
    triedb/pathdb/reader.go, and triedb/pathdb/biased_fastcache.go.
  • *Direct setter family on StateDB. Bypasses the journal at settle
    time so V2 can replay per-tx PDB writes onto finalDB
    deterministically. Pinned byte-equal to journaled SetX + Finalise by
    TestDirectSetterParity_*.
  • Production fallback. BlockChain wires V2 as the primary processor
    and falls back to serial on panics, ApplyMessage consensus errors,
    ctx cancellation, and witness requests.

Files that warrant the most reviewer attention

Tier 1 — load-bearing executor + per-tx state

File Notes
core/blockstm/v2_executor.go +631 new
core/parallel_state_processor.go +925; V2StateProcessor, settle-fn closure, env
core/state/parallel_statedb.go +1147 new
core/state/parallel_statedb_validate.go +223 new
core/state/parallel_statedb_settle.go +195 new
core/state/parallel_statedb_journal.go +127 new
core/state/safe_base.go +207 new

Tier 2 — concurrent stores

File Notes
core/blockstm/mvstore.go +186 new
core/blockstm/mvbalance_store.go +175 new

Tier 3 — modified upstream files (highest merge-conflict risk)

File Notes
core/state/statedb.go *Direct setters, skipTimers, concurrent-reads enabler
core/state/state_object.go concurrent-safe getters
core/state/database.go concurrent reader
core/state/reader.go cache attribution
core/state/trie_prefetcher.go concurrent prefetch
trie/trie.go, trie/secure_trie.go concurrent-reads mode
triedb/pathdb/reader.go sync.Map node-resolve cache for concurrent reads; small lock changes
triedb/pathdb/biased_fastcache.go lock semantics
core/vm/evm.go, jumpdests.go, instructions.go, interface.go, interpreter.go jumpdest cache sharing, precompile-cache, StateDB iface adds
core/blockchain.go V2 wiring + fallback
core/state_transition.go interrupt plumbing

Tier 4 — drift-detection tests

Read these to understand the parity contract V2 must hold against
StateDB:

File What it pins
core/state/v2_method_parity_test.go every StateDB method has a PDB mapping
core/state/v2_journal_entry_coverage_test.go every journal kind has a parallel mapping
core/state/v2_direct_setter_parity_test.go SetXDirect ↔ journaled
core/state/parallel_statedb_getter_table_test.go every PDB getter records its read
core/parallel_state_processor_hooks_parity_test.go tracing.Hooks fire-or-skip classification
core/parallel_state_processor_fork_parity_test.go params.IsX classification
core/v2_serial_parity_fuzz_test.go real-tx executor fuzz vs serial
core/mainnet_witness_benchmark_test.go gated 241-block end-to-end consistency + benchmark harness

See docs/blockstm-v2.md for full architectural
detail, the list of correctness bug classes V2 prevents, and
ongoing-improvement notes.

Introduces BlockSTM v2 — a from-scratch redesign of Bor's parallel
transaction execution engine. V2 speculatively executes block
transactions concurrently, validates each tx's reads against a
multi-version store, and re-executes any whose reads turned stale.
On the 241-block mainnet witness benchmark V2/4w delivers ~1.6×
throughput over serial (570 mgas/s vs 350 mgas/s, AMD Ryzen 7 5800H,
all-in-memory).

V2 runs three coordinated goroutine groups around a per-tx PDB:

  V2StateProcessor.Process    (core/parallel_state_processor.go)
        │
  ExecuteV2BlockSTM           (core/blockstm/v2_executor.go)
        │
  ┌─────┼──────────────┬──────────────┐
  │     │              │              │
  Workers (N)     Validator (1)   Settlement (1)
  ParallelStateDB StoreReads      finalDB
                  BalReads        IntermediateRoot

  Backed by:
    SafeBase       Thread-safe base reads (sync.Map caches over a
                   bounded pool of StateDB.Copy() with concurrent-
                   reads mode on the trieReader)
    MVStore        Sharded multi-version per-key store with a
                   lock-free bloom filter for cold-key reads
    MVBalanceStore Sharded commutative balance delta store
                   (per-tx Add/Sub; reads sum prior entries)

1. Task building. Block transactions become V2Tasks. Same-sender
   chains get pre-computed nonces (SenderNonces) so nonce reads
   on a chain are skipped during validation.

2. Parallel execution. N worker goroutines pull tasks from a
   buffered dispatcher (window numWorkers * InFlightTaskMultiplier).
   Each tx runs in its own ParallelStateDB; reads come from
   SafeBase + MVStore + MVBalanceStore and are recorded in
   StoreReads / BalReads. Writes accumulate locally (DeferMVWrites)
   and flush to MVStore at end-of-tx so concurrent readers only
   ever see FINAL values — never mid-tx reentrancy-guard writes.

3. Sequential validation. A single goroutine validates txs in
   tx-index order. Each recorded read is re-checked against MVStore;
   match by writer/incarnation OR by value-equal fallback (handles
   idempotent writes such as reentrancy-guard SSTOREs that flip
   back). Mismatch → MarkEstimate the failed tx's writes and
   dispatch a re-execution goroutine. Per-key pipelining: readers
   that hit an ESTIMATE entry under Incarnation > 0 block on
   WaitForFinal until the upstream writer is finalized.

4. Pipelined settlement. As txs finalize, a settlement goroutine
   drains chSettle in tx-index order and applies each tx's writes
   to finalDB (the real, single-threaded *state.StateDB) through a
   *Direct setter family that bypasses the journal, then asks
   finalDB for the IntermediateRoot.

V2 is gated on a layered test surface. From cheapest to most
expensive, and what each layer is meant to catch:

1. Compile-time conformance + drift detection
   The PDB shadows StateDB's interface and behaviour, so any
   upstream go-ethereum merge that adds or changes a StateDB
   method would silently bypass V2. A handful of `go test`-time
   checks fail CI before any logic runs:
     - core/vm/statedb_impl_test.go     (PDB satisfies vm.StateDB
                                         via a static assertion)
     - TestPDBMethodParity              (every StateDB method has
                                         a PDB mapping or is in
                                         pdbExemptMethods)
     - TestV2DependencyCompileCheck     (every StateDB method V2
                                         settle calls remains present)
     - TestV2JournalEntryCoverage       (every journal entry kind has
                                         a parallelJournalEntry mapping)
     - TestV2TracingHookParity          (every tracing.Hooks field is
                                         classified as fired-or-skipped)
     - TestV2ForkParity                 (every params.ChainConfig.IsX
                                         fork rule is classified V1/V2)

2. Per-method unit tests (~210 tests across ~25 files)
   Cover individual PDB getters/setters, MVStore / MVBalanceStore
   primitives, V2 executor channel mesh, and SettleTo helpers.
   Highlights:
     - core/state/parallel_statedb_test.go         (76 tests; PDB
                                                    behaviour + the
                                                    Tier-1 mutation
                                                    kill suite — see
                                                    layer 5 below)
     - core/state/parallel_statedb_coverage_test.go (42 tests;
                                                    branch coverage)
     - core/state/parallel_statedb_getter_table_test.go (every PDB
                                                    getter records
                                                    its read with
                                                    the right WriterIdx
                                                    across Committed /
                                                    ESTIMATE / NoEntry /
                                                    AtTxZero)
     - core/state/safe_base_test.go                (sync.Map cache +
                                                    pool semantics)
     - core/blockstm/mvstore_test.go,
       core/blockstm/mvbalance_store_test.go       (versioned store
                                                    primitives)
     - core/blockstm/v2_executor_wait_test.go      (waitForTx /
                                                    waitForFinal +
                                                    cancellation)

3. Direct-setter parity tests
   The *Direct setter family bypasses StateDB's journal at settle
   time. core/state/v2_direct_setter_parity_test.go (7 tests) pins
   that SetXDirect produces a byte-identical state root to journaled
   SetX + Finalise. Catches divergence the moment a future change
   to either path breaks the parity.

4. Differential tests vs serial StateDB
   Hand-written + table-driven scenarios that exercise the PDB
   against a parallel-mirror serial StateDB and assert byte-identical
   output. Catches behaviour drift the parity-table tests can't
   express:
     - core/state/v2_differential_test.go          (PDB-only diff)
     - core/state/v2_executor_differential_test.go (synthetic-env
                                                    executor diff)
     - core/v1_differential_test.go                (V1 vs serial
                                                    parity for the
                                                    legacy in-tree path)

5. Mutation testing (Tier-1 kill tests)
   diffguard runs mutation testing against V2's critical paths.
   Every survivor flagged by a sample run has a corresponding
   targeted test inline in core/state/parallel_statedb_test.go
   under the "Tier-1 mutation kill tests" divider — boundary,
   negation, and return-value mutations on storeReadMatches,
   journal revert, settleTo helpers, applyFeeData, Reset, etc.
   Tier-1 logic kill-rate ≥ 99% on the latest run.

6. Fuzz targets
   Randomized inputs against either a serial mirror or a hand-built
   reference:
     - core/state/v2_fuzz_test.go                (random PDB op
                                                  sequences vs StateDB)
     - core/state/v2_executor_fuzz_test.go       (executor-level fuzz
                                                  on synthetic env)
     - core/v2_serial_parity_fuzz_test.go        (FuzzV2ExecutorVsSerial:
                                                  random tx batches
                                                  through ExecuteV2BlockSTM
                                                  vs an ApplyMessage loop)
   The race-detected fuzz under `-race` caught the shared-trie-reader
   race that the non-race fuzz missed; worth keeping on the nightly.

7. End-to-end consistency + benchmark on real mainnet blocks
   core/mainnet_witness_benchmark_test.go bundles 241 real Polygon
   mainnet blocks (under core/blockstm/testdata/) with their pre-
   block witnesses. Two harnesses share the corpus:
     - TestV2BlockSTMAllBlocks (gated on BOR_BLOCKSTM_TEST=1)
       replays each block through both serial and V2 and asserts
       byte-identical state roots and receipt roots.
     - BenchmarkV2AllBlocks runs serial + V2 across worker counts
       (4 / 8 / 16) and witness-on/off variants on the same corpus.
       Backs the throughput numbers referenced at the top of this
       commit.

8. Runtime invariants under -tags=invariants
   Build-tag-gated runtime assertions inside the executor and the
   PDB. Off in production builds (zero-cost), on in CI:
     - assertSettleOrder              (validation walk's induction)
     - assertReexecVisitedExactlyOnce (drain loop doesn't lose a tx)
     - assertSettleNotPanicked        (panicked PDBs never settle)
   A tiny set of "panic if invariant breaks" tests under
   //go:build invariants verifies the assertions actually fire on
   crafted violations (core/blockstm/v2_executor_invariants_panic_test.go,
   core/state/parallel_statedb_invariants_panic_test.go).

9. Race detector
   All of layers 2-8 are runnable under `go test -race`. CI runs
   the full state + blockstm packages in race mode; the
   TestV2BlockSTMAllBlocks gated test is also race-clean on the
   241-block corpus.

10. Production soak — >1 million Polygon mainnet blocks
    Beyond the unit / parity / fuzz layers above, this branch has
    been used to sync more than 1,000,000 mainnet blocks end-to-end
    on a real node with V2 as the primary processor (with serial
    disabled). Zero state-root divergences, zero panics
    requiring fallback, no consensus-affecting issues observed.
    This is the most stringent layer: real on-chain workload,
    real database backend, real prefetcher contention.

  - intermediateRootTimer metric (chain/intermediateroot) — measures
    the post-execution trie computation in block_validator.go.

The code surface is ~5.1k lines across 39 production .go files,
plus ~11.7k lines across 37 test files. The remaining 484 file
entries in the diff are block + witness fixtures under
core/blockstm/testdata used by TestV2BlockSTMAllBlocks and the
benchmark harness — read-only data, no review needed.

Shapes of change a reviewer should expect:

  - New per-tx state. ParallelStateDB shadows *state.StateDB but
    reads from SafeBase + MVStore + MVBalanceStore and tracks reads
    for validation. Implements vm.StateDB. Has its own journal
    layer (parallelJournalEntry) parallel to StateDB's journal.go.

  - New concurrent stores. MVStore (sharded multi-version per-key
    store with bloom filter) and MVBalanceStore (sharded
    commutative balance deltas) — both new, both load-bearing.

  - New executor. ExecuteV2BlockSTM owns the worker pool +
    in-order validator + pipelined settle goroutine and the
    chSettle / completionCh / execDone channel mesh between them.

  - Concurrent-safe base reads. SafeBase is a thread-safe wrapper
    around a *state.StateDB with sync.Map caches + a bounded pool
    of db.Copy() instances; the pool copies share the underlying
    reader, so the V2 entry point flips trieReader into its
    concurrent-reads mode (sync.Map node-resolve cache instead of
    in-place mutation) — this required surgery in state/database.go,
    state/reader.go, state/trie_prefetcher.go, trie/trie.go,
    trie/secure_trie.go, triedb/pathdb/reader.go, and
    triedb/pathdb/biased_fastcache.go.

  - *Direct setter family on StateDB. Bypass the journal at
    settle time so V2 can replay per-tx PDB writes onto finalDB
    deterministically. Pinned byte-equal to journaled SetX +
    Finalise by TestDirectSetterParity_*.

  - Production fallback. BlockChain wires V2 as the primary
    processor and falls back to serial on panics, ApplyMessage
    consensus errors, ctx cancellation, and witness requests.

Tier 1 — load-bearing executor + per-tx state:

  core/blockstm/v2_executor.go              (+631 new)
  core/parallel_state_processor.go          (+925 V2StateProcessor,
                                             settle-fn closure, env)
  core/state/parallel_statedb.go            (+1147 new)
  core/state/parallel_statedb_validate.go   (+223 new)
  core/state/parallel_statedb_settle.go     (+195 new)
  core/state/parallel_statedb_journal.go    (+127 new)
  core/state/safe_base.go                   (+207 new)

Tier 2 — concurrent stores:

  core/blockstm/mvstore.go                  (+186 new)
  core/blockstm/mvbalance_store.go          (+175 new)

Tier 3 — modified upstream files (highest merge-conflict risk):

  core/state/statedb.go                     (Direct setters,
                                             skipTimers, concurrent
                                             reads enabler)
  core/state/state_object.go                (concurrent-safe getters)
  core/state/database.go                    (concurrent reader)
  core/state/reader.go                      (cache attribution)
  core/state/trie_prefetcher.go             (concurrent prefetch)
  trie/trie.go, trie/secure_trie.go         (concurrent-reads mode)
  triedb/pathdb/reader.go                   (sync.Map node-resolve
                                             cache for concurrent
                                             reads; small lock changes)
  triedb/pathdb/biased_fastcache.go         (lock semantics)
  core/vm/evm.go, jumpdests.go,             (jumpdest cache sharing,
       instructions.go, interface.go,        precompile-cache,
       interpreter.go                        StateDB iface adds)
  core/blockchain.go                        (V2 wiring + fallback)
  core/state_transition.go                  (interrupt plumbing)

Tier 4 — drift-detection tests (read these to understand the
parity contract V2 must hold against StateDB):

  core/state/v2_method_parity_test.go       (every StateDB method
                                             has a PDB mapping)
  core/state/v2_journal_entry_coverage_test.go
                                            (every journal kind has
                                             a parallel mapping)
  core/state/v2_direct_setter_parity_test.go
                                            (SetXDirect ↔ journaled)
  core/state/parallel_statedb_getter_table_test.go
                                            (every PDB getter
                                             records its read)
  core/parallel_state_processor_hooks_parity_test.go
                                            (tracing.Hooks fire-or-
                                             skip classification)
  core/parallel_state_processor_fork_parity_test.go
                                            (params.IsX classification)
  core/v2_serial_parity_fuzz_test.go        (real-tx executor fuzz
                                             vs serial)
  core/mainnet_witness_benchmark_test.go    (gated 241-block end-
                                             to-end consistency +
                                             benchmark harness)

See docs/blockstm-v2.md for full architectural detail, the list of
correctness bug classes V2 prevents, and ongoing-improvement notes.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@marcello33 marcello33 requested a review from Copilot May 5, 2026 08:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@marcello33
Copy link
Copy Markdown
Collaborator

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @marcello33's task in 4m 15s —— View job


Code Review: BlockSTM v2 Parallel Transaction Execution

  • Read Tier 1 files — executor, parallel state processor, ParallelStateDB
  • Read Tier 2 files — MVStore, MVBalanceStore
  • Read Tier 3 files — modified upstream files (statedb, state_object, database, trie, blockchain)
  • Read supporting files — SafeBase, journal, validation, settlement
  • Read VM interface changes
  • Security review (concurrency, consensus, fund safety)
  • Post final review

This is a thorough, well-engineered redesign of Bor's parallel execution engine. The architecture — workers + single-goroutine validator + pipelined settlement — is sound and avoids most classic concurrency bugs by design. The test surface (10 layers including mutation testing, differential fuzz, and 241-block mainnet corpus) is exceptional for a change of this complexity.

Below are the findings, organized by severity.


Potential Issues

1. SubBalanceDirect has no underflow guard — uint256.Int.Sub wraps silently

Severity: MEDIUM | core/state/statedb.go:2191

func (s *StateDB) SubBalanceDirect(addr common.Address, amount *uint256.Int) {
    // ...
    obj.setBalance(new(uint256.Int).Sub(obj.Balance(), amount))
}

uint256.Int.Sub wraps on underflow (it performs modular arithmetic). The regular SubBalance path goes through stateObject.SubBalance which also wraps, so this is consistent with serial behavior. However, the regular path journals the old balance for revert — SubBalanceDirect skips the journal entirely. If a bug elsewhere produces a wrong amount, the balance silently wraps to a huge number with no revert possible during settlement. Since V2 settlement is irrevocable by design, this is worth noting. The EVM's own transfer checks prevent this in practice, but a defensive if amount.Gt(obj.Balance()) { panic(...) } guard in the Direct setter would be cheap insurance.

2. SenderInitBalance nil check uses inline function — fragile

Severity: LOW | core/state_transition.go:666

SenderInitBalance: func() *big.Int { if input1 != nil { return input1.ToBig() }; return nil }(),

input1 is assigned from st.state.GetBalance(st.msg.From) at line 468, which never returns nil (returns uint256.NewInt(0) for missing accounts). The nil guard is dead code and the inline IIFE obscures this. A straight input1.ToBig() is clearer and matches the idiom used everywhere else in this function.

3. ecrecoverCache key is a fixed [128]byte — input < 128 bytes is right-padded

Severity: LOW | core/vm/evm.go:79-81

var key [128]byte
copy(key[:], common.RightPadBytes(input, 128))

This allocates a 128-byte temporary via RightPadBytes on every cache miss AND every cache hit check. Since copy already zero-pads the destination (the key array is zero-initialized), the RightPadBytes call is redundant — copy(key[:], input) achieves the same result without the allocation.

4. keccak256Cache stores common.Hash (value type) but loads with type assertion

Severity: LOW | core/vm/instructions.go:270-275

if cached, ok := evm.Config.Keccak256Cache.Load(key); ok {
    size.SetBytes(cached.(common.Hash).Bytes())
    return nil, nil
}

The .Bytes() call allocates. size.SetBytes32(cached.(common.Hash)[:]) would avoid the allocation on the hot path, though the compiler may optimize this away.


Design Observations (Not Bugs)

5. ProcessBlock now waits synchronously for the losing processor

Observation | core/blockchain.go:936-938

The old code drained the losing processor asynchronously (go func() { ... }). The new code blocks:

if processorCount == 2 {
    second_result := <-resultChan
    second_result.statedb.StopPrefetcher()
}

This is actually correct and important — the old async drain was a latent bug because the caller could commit the block (advancing pathdb layers) while the loser's prefetcher still held trie references. The comment explains this well. Worth keeping.

6. Ablation flags are exported struct fields with no config plumbing

Observation | core/blockchain.go:417-420

AblationSkipFlush    bool
AblationSkipSettle   bool
AblationSkipFinalise bool
AblationSkipMVRead   bool

These appear to be performance-experiment knobs. They're exported but I don't see them wired to any CLI flag or config struct, which means they're only settable from test code. If they're not intended for production, consider keeping them unexported or behind a build tag to prevent accidental activation.

7. Bloom filter false-positive rate at high key counts

Observation | core/blockstm/mvhashmap.go:25-30

The bloom filter is 32Kbit (4KB). The PR description notes ~5% false-positive rate at 5000 unique keys. For large blocks this means ~5% of cold-key MVStore reads will unnecessarily acquire a shard read lock. The shard lock is an RWMutex with an empty map lookup, so the overhead is small — but worth monitoring with the Estimates counter if real-world blocks push past 5K unique keys.

8. Re-execution goroutine uses workerID = x.numWorkers (out of normal worker range)

Observation | core/blockstm/v2_executor.go:512

x.execute(idx, x.numWorkers)

Re-execution uses a synthetic worker ID equal to numWorkers (one past the last real worker). This means re-execution and initial execution can overlap on different worker IDs for the same tx. Since execute only writes x.states[idx] and per-idx MVStore entries, this is safe — but worth a comment noting the sentinel value.


What Looks Correct

Concurrency model: The separation of concerns — workers produce PDBs, single validator thread validates in order, single settler thread applies to finalDB — is the right architecture. The only shared mutable state is the MVStore/MVBalanceStore, which are properly sharded with per-shard mutexes.

Channel-based synchronization: execDone (closed once per tx by the initial worker), completionCh (closed once per tx when finalized), and chSettle (buffered, drained in order) form a correct coordination mesh. The cancellation via ctx.Done() is properly wired through waitForTx, waitForFinal, startTaskDispatcher, and runValidationLoop.

ESTIMATE handling: The spin-wait-on-ESTIMATE-then-retry pattern in readStoreWait / handleEstimate correctly implements per-key pipelining. The ReadVersionFull atomic read of (value, estimate flag) prevents the TOCTOU race mentioned in the comment.

Journal correctness: The parallel journal (parallelJournalEntry) correctly reverts all state kinds. The revertBalance path subtracts from localBalAdd/localBalSub (not from MVBalanceStore), which is correct because balance deltas are only flushed at end-of-tx.

Settlement ordering: The settleBalanceOpsAndLogs function replays balance operations in recording order, interleaving transfer logs at the correct positions. This is a subtle but correct approach to matching the serial path's log ordering.

Fallback path: V2 propagates panics, consensus errors (ExecErr), and context cancellation correctly to the caller, which falls back to serial execution. This defense-in-depth is important for a production deployment.

Trie concurrent reads: The getConcurrent path in trie/trie.go correctly uses a sync.Map resolve cache instead of mutating the tree in place. The prevalueTracer.Put call inside the concurrent path is noted as having its own lock — this is correct.

pathdb reader hash-mismatch retry: The cache-eviction-and-retry logic in triedb/pathdb/reader.go handles the race between the biased preloader's Has→Set and a flusher's Set(newer). This is a real bug fix, not just V2 work.


Test Coverage Assessment

The 10-layer test pyramid described in the PR is well-structured:

  • Compile-time conformance (method parity, journal coverage, fork parity) catches upstream drift automatically — this is the most valuable layer for long-term maintenance.
  • Differential tests against serial StateDB and the 241-block mainnet corpus provide strong confidence in correctness.
  • Mutation testing at ≥99% kill rate on Tier-1 logic is impressive.
  • Race-detected fuzz catching the shared-trie-reader race validates the -race investment.

The 1M+ mainnet block soak mentioned in the PR description provides the strongest signal — no state-root divergence or panics over real production workload.


Summary

This is a high-quality, well-tested PR. The architecture is sound, concurrency primitives are used correctly, and the fallback to serial execution provides a safety net. The issues found are minor (no CRITICAL or HIGH severity). The most actionable item is the defensive underflow guard in SubBalanceDirect (finding #1). The remaining items are cosmetic or informational.

Comment thread core/blockchain.go Outdated
Comment thread core/blockchain.go Outdated
Comment thread core/blockstm/mvhashmap.go Outdated
@adamdossa adamdossa requested a review from pratikspatil024 May 5, 2026 15:20
cffls and others added 4 commits May 5, 2026 11:14
Cosmetic / micro-perf fixes from the claude[bot] review of #2210.
No behaviour change.

* state/statedb.go SubBalanceDirect — add a comment noting the
  uint256.Sub wrap matches the journaled SubBalance path
  (statedb.go:922) and that TestDirectSetterParity_SubBalance pins
  byte-equality between the two. A defensive panic was suggested
  but would diverge from the journaled path and break the parity
  test, so we keep the documentation-only.

* state_transition.go SenderInitBalance — drop the inline IIFE nil
  check; input1 is GetBalance(...) which returns a value type,
  never nil. Straight input1.ToBig() matches the idiom used
  elsewhere in the function.

* vm/evm.go runEcrecoverWithCache — drop the redundant
  RightPadBytes(input, 128) allocation. The [128]byte key is
  zero-initialised, so copy(key[:], input) achieves the same
  result without the extra heap allocation. Caller already
  guarantees len(input) <= 128.

* vm/instructions.go opKeccak256 — replace size.SetBytes(cached.
  (common.Hash).Bytes()) with size.SetBytes32(h[:]) to skip the
  per-cache-hit Bytes() allocation on the SHA3 fast path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 241-block witness fixtures under core/blockstm/testdata/ are
managed via Git LFS (~1.6 GB total). On a fresh clone or a CI
runner that hasn't run `git lfs pull`, the .block / .witness.gz
files are LFS pointer text stubs rather than the real data, and
gzip.NewReader fails with "gzip: invalid header" — exactly what
the unit-tests CI workflow has been hitting.

Detect LFS pointers in readFileMaybeGz via the canonical
"version https://git-lfs.github.com/spec/" prefix and surface a
sentinel errLFSPointer error. loadEmbeddedBlocks and
loadBlocksFromDir then call t.Skipf instead of t.Fatalf when the
fixtures aren't materialized — the harness skips cleanly with a
helpful message ("run `git lfs pull` to materialize testdata")
instead of producing confusing gzip errors.

This is the same prerequisite called out in
docs/blockstm-v2.md → "Test data (Git LFS)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lint workflow on #2210 flagged 15+ issues across V2 files. Fix
each so `make lint` is clean. No behaviour change in production
paths.

* goimports — formatting on ~12 files (idiomatic import grouping
  + blank-line alignment that the in-tree gofmt missed).
* unused — drop dead code:
    - executeWithParallelStateDBV2 + ValidatingParallelStateDB.checkBalance
      in core/mainnet_witness_benchmark_test.go (debug shims, never
      called).
    - timedMockV2State.execDelay + timedMockV2Env.fails in
      core/blockstm/v2_executor_test.go (vestigial fields).
    - ParallelStateDB.priorDestructed convenience wrapper (callers
      use priorDestructedAt).
    - opSubRefund / opWarmAddress diff ops in
      core/state/v2_differential_test.go (no scenario references them).
* copyloopvar — drop the redundant `x := x` loop-variable copies
  across 8 test files (Go 1.22+ no longer needs them).
* unconvert — drop the `time.Duration(result.Phase1)` cast (Phase1
  is already time.Duration) and the `JumpDestCache(newMapJumpDests())`
  cast (already satisfies the interface).
* durationcheck — fix `timeAfter(seconds time.Duration)` in
  core/blockstm/mvbalance_store_test.go: callers passed an int and
  the multiplication `seconds * time.Second` is a duration*duration
  bug. Make the parameter `int` and cast inside.
* copylocks — `*statedb = *backupStateDB` in V1's
  maybeRerunWithoutFeeDelay copies a struct holding atomic.Int64.
  This is single-threaded V1 rerun-from-snapshot; tag with
  `//nolint:govet` and a comment.
* whitespace — drop a leading blank line in v2Env.Execute that
  golangci-lint flagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues flagged in the inline PR review of #2210.

* core/blockchain.go: V2-failure fallback recovery was broken.
  cancel() ran BEFORE the `if result.parallel && result.err != nil`
  block, so when V2 finished first with an error (panic, ApplyMessage
  consensus error) the still-running serial processor was interrupted
  at its next tx boundary and the fallback `result = <-resultChan`
  received context.Canceled instead of a usable serial result.
  Documented as the recovery contract in the PR description.

  Move cancel() and followupInterrupt.Store(true) to AFTER the
  fallback block. The fallback (when V2 errors with V1 also running)
  gets a real serial result. Once we have the result we plan to
  return, cancel the loser so it stops at its next tx boundary
  before commit advances the pathdb layer (the original intent).

  See review comment r3187036031.

* core/blockchain.go: drop unused AblationSkip* fields from
  BlockChain. Four exported boolean fields (AblationSkipFlush /
  AblationSkipSettle / AblationSkipFinalise / AblationSkipMVRead)
  were declared but never read or written anywhere — repo-wide grep
  confirms zero references outside the declaration site. The
  intended bridge from these BlockChain-level toggles to the per-
  block MVHashMap.Skip* fields (which ARE wired) was never threaded
  through, so flipping the BlockChain field was a silent no-op.
  Exported fields enter the API surface, so keeping them locks us
  into either a SemVer-breaking removal or maintainer confusion;
  drop them now and re-introduce as wired knobs in a separate
  change if/when the ablation experiments need a runtime entry
  point.

  See review comment r3187036037.

* core/blockstm/mvhashmap.go: bloom h2 dimension was constant zero
  for the hottest key class. h2 read bytes [20:24] of Key, which are
  populated only for state keys; NewAddressKey leaves [20:52] zero
  and NewSubpathKey leaves [20:51] zero. h3 also half-degraded for
  those classes ([28:32] zero). Result: address-only and subpath
  reads collapsed the bloom from 3-of-3 to ~2-of-3, which doubles
  the false-positive rate at typical block sizes (~0.07% → ~0.35%
  at 1k unique keys). No correctness impact, just hot-path
  selectivity.

  Re-derive h2 from address bytes [16:20] (always populated) and
  fold the subpath/type bytes [52][53] into h3 so all three hashes
  draw from non-constant ranges for every key class. Updated the
  comment to reflect the new layout.

  See review comment r3187036040.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 91.92825% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.34%. Comparing base (cc68ffd) to head (e6d7481).

Files with missing lines Patch % Lines
core/blockchain.go 82.94% 17 Missing and 5 partials ⚠️
core/blockstm/mvhashmap.go 91.33% 9 Missing and 4 partials ⚠️
core/block_validator.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2210      +/-   ##
===========================================
+ Coverage    52.87%   53.34%   +0.46%     
===========================================
  Files          886      896      +10     
  Lines       156811   159599    +2788     
===========================================
+ Hits         82910    85133    +2223     
- Misses       68641    69144     +503     
- Partials      5260     5322      +62     
Files with missing lines Coverage Δ
core/blockchain_reader.go 43.42% <100.00%> (+0.60%) ⬆️
core/blockstm/invariants_off.go 100.00% <100.00%> (ø)
core/blockstm/mvbalance_store.go 100.00% <100.00%> (ø)
core/blockstm/mvstore.go 100.00% <100.00%> (ø)
core/block_validator.go 49.55% <75.00%> (+1.37%) ⬆️
core/blockstm/mvhashmap.go 91.23% <91.33%> (+11.36%) ⬆️
core/blockchain.go 63.07% <82.94%> (+0.68%) ⬆️

... and 46 files with indirect coverage changes

Files with missing lines Coverage Δ
core/blockchain_reader.go 43.42% <100.00%> (+0.60%) ⬆️
core/blockstm/invariants_off.go 100.00% <100.00%> (ø)
core/blockstm/mvbalance_store.go 100.00% <100.00%> (ø)
core/blockstm/mvstore.go 100.00% <100.00%> (ø)
core/block_validator.go 49.55% <75.00%> (+1.37%) ⬆️
core/blockstm/mvhashmap.go 91.23% <91.33%> (+11.36%) ⬆️
core/blockchain.go 63.07% <82.94%> (+0.68%) ⬆️

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread core/blockchain.go
Comment thread core/blockstm/mvstore.go
Comment thread core/blockchain.go Outdated
cffls and others added 2 commits May 5, 2026 13:49
Three more issues flagged in the inline PR review of #2210.

* core/parallel_state_processor.go: V2 was silently dropping the
  stateless-witness pointer. ProcessBlock wires the witness via
  parallelStatedb.StartPrefetcher("chain", witness, nil), but inside
  V2.Process the prefetcher is restarted with a hard-coded nil for
  the witness slot — and StateDB.StartPrefetcher unconditionally
  overwrites s.witness, so every s.witness != nil-gated collection
  point (CollectStateWitness, CollectCodeWitness, settle-phase trie
  walks) became a no-op for the rest of execution. On
  StatelessSelfValidation and single-block makeWitness paths the
  produced witness landed empty with no error.

  Fix: stash finalDB.Witness() before StopPrefetcher and pass it
  through to the v2-settle prefetcher restart, so the wired pointer
  survives the swap.

  See review comment r3191282978.

* core/state/parallel_statedb.go: SelfDestruct skipped recordWrite
  for the SuicidePath key. FlushToMVStore writes
  (SuicidePath_addr, txIdx, inc, true) for every entry in
  s.destructed, but the key was never appended to s.WriteKeys, so
  MVStore.MarkEstimate / CleanupEstimate could not reach it on
  re-execution. Cross-incarnation invalidation was broken: a stale
  SuicidePath entry from incarnation N survived into N+1's view, and
  a downstream tx that observed it via priorDestructedAt could pass
  validation against state that no longer exists — a state-root
  divergence path. Every other MVStore-targeting writer (SetNonce,
  SetCode, SetState, CreateAccount) already calls recordWrite for
  the same reason; only the destruct path was missed.

  Fix: call s.recordWrite(NewSubpathKey(addr, SuicidePath)) inside
  the !s.destructed[addr] guard, matching the journal-entry guard so
  repeated SelfDestruct in the same tx doesn't append a duplicate.
  Pinned by TestPDB_SelfDestruct_RecordsSuicidePathWrite.

  See review comment r3191282996.

* core/blockchain.go: V2 reader cache hit/miss stats were silently
  dropped. setupBlockReaders called ReadersWithCacheStatsTriple to
  create three independent ReaderWithStats wrappers (prefetch /
  process / parallel) and wired the parallel one into
  parallelStatedb, but reportReaderStats only consumed prefetch and
  process. V2's reads accumulated in the parallel wrapper's atomic
  counters and were discarded each block — and since V2 is the
  primary processor in production, the chain/state/account /
  storage/cache/{hit,miss} meters were essentially empty on the hot
  path.

  Fix: thread parallel through setupBlockReaders' return signature
  and into reportReaderStats. process and parallel both carry the
  roleProcess label and share the same underlying cache, so merge
  their counters into the same meter set rather than introducing a
  new "process_parallel" series.

  See review comment r3191283003.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The diffguard run on commit 4c688e4 flagged 35 surviving mutations
(score 77.7%, T1 logic 84.5%). Triage classified seven as
HIGH-severity — branch / boolean / conditional mutations on V2's
per-tx correctness paths. Add five targeted Tier-1 kill tests for
the five that are deterministically killable:

* TestPDB_EnableReadTracking_InitializesBalAddrs
  Pins the `s.BalAddrs == nil` guard at parallel_statedb.go:335.
  Flipping == to != silently skips the make() on a fresh PDB,
  leaving cap=0 instead of the documented 8 — every recordBalWrite
  reallocates. Test asserts cap >= 8 after EnableReadTracking.

* TestPDB_PriorDestructedAt_RecordsAbsenceRead
  Pins the else-if branch at parallel_statedb.go:531. Removing the
  body drops the absence read recordStoreRead(suicideKey, -1, 0,
  nil); without it, validation can't catch a concurrent prior tx
  destructing addr. Test asserts the absence read appears in
  StoreReads AND that subsequent MVStore writes flip validation
  to invalid.

* TestPDB_Exist_DestructedInBaseReturnsFalse
  Pins the `if suicideIdx >= 0 { ... }` branch at
  parallel_statedb.go:576. Removing the body lets a destructed
  addr fall through to base.Exist and incorrectly return true
  when the account was set up in base. The test seeds the base
  StateDB with code on addr (so the fallthrough path is
  observable) and asserts Exist returns false after a SuicidePath
  write.

* TestPDB_CreateAccount_WritesTrueValue
  Pins the literal `true` at parallel_statedb.go:1014
  (CreateAccount → store.WriteInc). Flipping it to false would
  publish (CreatePath_addr, txIdx, inc, false), defeating the
  value-based fallback in storeReadMatches. Test reads the MVStore
  entry and asserts the value is true.

* TestPDB_DiagnoseBalanceRead_MatchReturnsFalse
  Pins the `false` literal at parallel_statedb_validate.go:215.
  Flipping to true would have a matching balance read produce a
  phantom diagnostic with zero-valued fields; DiagnoseValidation
  would aggregate these as empty "" -category diags. Test asserts
  len(diags) == 0 on a matching read.

The other two HIGH survivors are documented as unkillable in
their current form:

* parallel_statedb.go:751 GetCodeHash `if len(code) == 0` —
  removing the early-return falls through to
  crypto.Keccak256Hash(empty), which equals types.EmptyCodeHash by
  spec. Behaviourally equivalent; can't be killed without locking
  in an internal performance signal.

* v2_executor.go:586 runValidationLoop `cancelled = true` after
  ctx-cancel — the mutation's observable effect (drain runs on
  cancel) is timing-dependent because reexec goroutines exit
  promptly via ctx.Done() in waitForTx/waitForFinal regardless,
  so the post-loop drain completes either way. A deterministic
  kill needs a redesign of the cancel handling.

Each new test was verified to:
  - PASS on unmutated code,
  - FAIL on the corresponding mutated code (sed/python
    in-place mutation, run, restore).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread core/blockstm/mvstore.go
Comment thread core/blockchain.go
Comment thread core/blockstm/mvhashmap.go Outdated
- core/blockstm/mvhashmap.go: drop unused MVHashMap.SkipFlush field. The
  three sibling ablation toggles are wired through the V2 path; SkipFlush
  alone has no consumer, and as an exported field on a public type it
  would be SemVer-impactful to remove later. Same dead-scaffolding pattern
  removed from BlockChain.AblationSkip* in 4c688e4.

- core/parallel_state_processor.go: stop discarding the prefetcher-warmed
  shared JUMPDEST cache. v2Env.Execute previously called
  evm.SetJumpDestCache(e.jumpDests) unconditionally, overriding the
  shared cache that vm.NewEVM had just wired from vmConfig. Now the
  override (and the per-v2Env allocation) are gated on the absence of
  vmConfig.SharedJumpDestCache: production paths via ProcessBlock keep
  the prefetcher's analysis; benchmarks and single-block witness paths
  that bypass that wiring still get the per-v2Env fallback.

- core/blockstm/v2_executor.go: wrap the two predecessor receives in
  v2ExecCtx.execute (completionCh[k-1] and execDone[prev]) with ctx
  selects, mirroring waitForTx/waitForFinal. Without these, a worker
  that entered execute past the vfailed[k-1] predicate could hang
  forever on cancellation — runValidationLoop deliberately skips the
  finishReexec drain on cancel, so completionCh[k-1] is never closed,
  wg.Wait blocks indefinitely, and the V2 driver, worker pool,
  validator/settlement, MVStore/MVBalanceStore/SafeBase, and parallel
  StateDB all leak. Early return from execute is safe — the worker
  loop unconditionally closes execDone[taskIdx], unblocking cascading
  downstream waiters.

Benchmarks (V2Embedded, n=10): -2.63% on mean (t=-3.90, statistically
significant); V2AllBlocks 241 blocks neutral within noise. Production
prefetcher-cache reuse not measurable in current benchmark harnesses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Code review

Unable to complete review — this PR exceeds GitHub's 300-file API diff limit (500+ files, mostly testdata fixtures under core/blockstm/testdata/), and the CI sandbox restricts the gh api / git fetch commands needed to retrieve the diff locally.

Previous review (2026-05-05): No issues found. Checked for bugs and CLAUDE.md compliance.

Since that review, 6 follow-up commits were pushed:

  • a7c0b00 — address PR review feedback on V2 follow-up
  • 7e99f1f — skip witness-corpus tests cleanly when LFS testdata is missing
  • df8a4bb — fix lint failures
  • 4c688e4 — address inline PR review comments
  • e8e367a — address second round of inline review comments
  • 0c26b08 — kill diffguard Tier-1 survivors on V2 PDB hot paths

These commits address reviewer feedback and fix lint/test issues. A fresh automated review of these changes was not possible due to the access limitations described above.

Comment thread core/blockchain.go
Comment thread core/blockstm/mvstore.go
…spec-tests sweep

Drove V2 BlockSTM through the full Ethereum execution-spec-tests v5.1.0
blockchain corpus (67,032 subtests) and chased every V2-only divergence.
Mixed in are the production nil-guards required for V2 (and serial) to
run at all on non-Bor chain configs — without them the spec-test fixtures
panic in state_transition.execute on the burnt-contract dereference.

Final result: V2 matches serial pass-for-pass on all 67,032 subtests, up
from 1,676 passes / 253 V2-only divergences at the start of the sweep.
Comment thread core/blockstm/mvhashmap.go Outdated
Comment thread core/blockstm/mvbalance_store.go Outdated
cffls and others added 3 commits May 12, 2026 21:17
…anic, split settleNonces/settleStorage, document tx lifecycle

- core/state/parallel_statedb_validate.go: replace `default: a == b` in
  valuesEqual with an explicit type switch over the MVStore value types
  in use (bool, uint64, common.Hash, []byte, nil). Unknown types now
  panic with a clear message instead of either silently degrading to
  pointer-identity (for pointer types) or crashing the runtime (for
  non-comparable types like slices, maps, or structs containing them) —
  both of which would be consensus-affecting failure modes the moment a
  new MVStore subpath is added.

- core/blockstm/v2_executor.go: wrap runValidationLoop's body in a
  defer-recover so a panic in Validate() is captured into
  V2ExecutionResult.ValidationPanic instead of crashing the bor process
  via an unrecovered goroutine panic. A second deferred close on
  chSettle (guarded by settleClosed) keeps the settle goroutine from
  hanging on wg.Wait when the recover path skips the normal cleanup.
  core/parallel_state_processor.go: surface a non-nil ValidationPanic
  as an error so BlockChain.ProcessBlock falls back to the serial
  processor instead of taking down the node.

- core/state/parallel_statedb_settle.go: split settleNoncesAndStorage
  into settleNonces and settleStorage. The two loops were independent
  with no shared state or ordering constraint; bundling them was
  inconsistent with the rest of the per-concern settle helpers. Test
  TestPDB_SettleNoncesAndStorage splits in lockstep.

- docs/blockstm-v2.md: add a Transaction Lifecycle section between
  Execution Flow and Key data structures. Covers (1) the state diagram
  a tx passes through, (2) the structural invariant that bounds each
  tx to at most two executions (validation order + the
  finishReexec(i-1) gate + no re-validation of the re-exec result),
  (3) a worked cascading-vfail example walking tx1/tx2/tx3 through
  initial+reexec to show why a 3-tx cascade still converges with one
  re-exec per failed tx, and (4) a concurrency timeline diagram
  illustrating the worker/validator/reexec/settle lanes. Also reorders
  the Storage/Nonces bullets in the Settlement section to match
  SettleTo's call order.

Tests added: TestValuesEqual_Bool, TestValuesEqual_Nil,
TestValuesEqual_UnsupportedTypePanics (lock in the loud-panic
contract), and TestV2ValidationPanicIsRecovered (end-to-end check that
a panicking Validate() surfaces via ValidationPanic without hanging
the executor).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- core/blockchain.go: stop V2 prefetcher on V2-error path. The
  guard `if ctx.Err() != nil` couldn't fire because cancel() was
  moved to after the fallback block in 4c688e4, leaving the
  "v2-settle" prefetcher's subfetcher goroutines orphaned with live
  trie references when V2 returns an error and the main goroutine
  decrements processorCount in the fallback path (skipping the
  final drain).

- core/blockstm/mvhashmap.go, mvbalance_store.go, mvstore.go: replace
  the 2-byte shard-index hash with FNV-1a over the address bytes.
  `addr[0]<<8 | addr[1] % N` collapses to `addr[1] % N` whenever N
  divides 256, routing Polygon's three hot system contracts
  (0x...001000 validator set, 0x...001001 state receiver, 0x...001010
  MATIC) all to shard 0. A new addrShardIndex helper in mvhashmap.go
  is shared by all three sharded stores.

- core/blockstm/mvhashmap.go: switch Write and MarkEstimate to
  `defer cells.rw.Unlock()`, matching the existing pattern in
  Delete. The "should not happen" panic inside the locked region
  in either function would otherwise strand the write lock and
  deadlock subsequent operations on the same key (V1 BlockSTM has
  a top-level recover, so the panic doesn't unwind the goroutine).
  V2's MVStore.MarkEstimate has no panic inside the locked region.

- core/blockstm/mvbalance_store.go: delete the dead `version` field
  and `Version()` method. Zero production callers — only exercised
  by mvbalance_store_test.go. Same dead-scaffolding pattern as
  BlockChain.AblationSkip* (4c688e4) and MVHashMap.SkipFlush (0ce45f0).
  The three sh.version[addr]++ writes under the shard lock and the
  64×16 map slot pre-allocation per block are now removed.

Test added: TestMVBalanceStore_HotKeysDistributeAcrossShards pins
the shard-distribution property for Polygon's three system contracts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- core/evm.go: guard nil chain in NewEVMBlockContext's Bor-vs-Ethereum
  transfer-function selection. TestProcessParentBlockHash passes a nil
  ChainContext and the unconditional chain.Config() call introduced in
  e7aaf2d panicked.

- eth/tracers/internal/tracetest/calltrace_test.go: inject a non-nil
  BorConfig on call_tracer_withLog fixtures and on TestInternals so
  AddFeeTransferLog fires. The fixtures' "want" output records Bor's
  synthetic fee transfer log at 0x...001010, recorded before
  AddFeeTransferLog was gated on Bor != nil in core/state_transition.go.
  Without the injection, 10 call_tracer_withLog subtests and
  TestInternals/Mem_expansion_in_LOG0 fail with missing log entries.

- goimports reformat in three test files that the spec-tests sweep
  modified but didn't reformat (Authorities() method added):
  core/blockstm/v2_executor_test.go, core/state/v2_executor_differential_test.go,
  core/v2_selfdestruct_self_beneficiary_test.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code review

2 issues found. Checked for bugs and CLAUDE.md compliance.


Issue 1: Bug — nil pointer dereference on context cancellation

File: core/blockstm/v2_executor.go line 545 (link)

When the context is cancelled during V2 execution, a worker can return early from execute() (via ctx.Done() selects at lines 409/418) without populating x.states[taskIdx], which remains nil. The worker then unconditionally closes execDone[taskIdx] (line 442).

In validateOne, the non-deterministic select (lines 609-613) can pick execDone[i] over ctx.Done() when both are ready. The nil guard at line 618 (x.states[i] != nil && x.states[i].Validate()) correctly prevents calling Validate() on nil, but the false branch falls through to line 624 which calls dispatchReexec(i). Here at line 545, x.states[i].ValidateCategory() unconditionally dereferences x.states[i], panicking on the nil interface.

The panic is caught by recover() at line 644 so it does not crash the node, but it causes an unnecessary serial fallback and a misleading V2 validation panic log under normal cancellation conditions.

Note that line 552 already has a nil guard for oldState := x.states[i] — the same guard is needed before line 545.

Suggested fix: wrap the ValidateCategory call in a nil check, matching the existing pattern at line 552.


Issue 2: CLAUDE.md violation — discarded error from TransactionToMessage

File: core/parallel_state_processor.go line 890 (link)

The error from TransactionToMessage is silently discarded with underscore. This violates CLAUDE.md (Handle Errors: Never ignore errors with underscore) and security-common.md (Error values checked — never discard errors with underscore in security-sensitive paths).

Both other call sites in this file propagate the error: Process at line 457 and buildV2Tasks at line 1227. This site is inconsistent.

If TransactionToMessage fails, msg is nil and tasks[idx].Msg stays nil, which will cause a nil pointer panic downstream in applyMessage when accessing t.msg.From. While the panic is recovered, it produces a misleading error path instead of a clean error return.

Comment thread core/blockstm/mvstore.go
Comment thread core/blockchain.go
Three independent fixes triggered by the @claude bot review on PR #2210
and the V2 vs serial parity bisect.

* core/blockstm/v2_executor.go: nil-guard x.states[i].ValidateCategory()
  in dispatchReexec. A worker returning early on ctx cancellation
  (the ctx.Done() selects at lines 410/418) leaves x.states[i] nil,
  and validateOne's select can still pick execDone[i] over ctx.Done()
  and route here. The existing oldState guard at line 552 already
  handles the same case; this matches that pattern.

* core/parallel_state_processor.go: recoverTaskMessages now returns
  (int, error) and ExecuteV2BlockSTM surfaces the lowest-indexed
  failure via ExecErrIdx/ExecErr instead of swallowing it with
  underscore. V2StateProcessor.Process already aborts the block on
  non-negative ExecErrIdx, so we get a clean error path instead of a
  nil-Msg panic inside applyMessage.

* core/mainnet_witness_benchmark_test.go: every &BlockChain{hc: ...}
  test stub now also sets chainConfig: config. e7aaf2d started
  reading chain.Config() in NewEVMBlockContext to choose between Bor's
  Transfer (synthetic fee-transfer log) and EthereumTransfer (no log).
  (*BlockChain).Config() returns bc.chainConfig — not bc.hc.config —
  and the stub left chainConfig nil, so V2 silently picked
  EthereumTransfer on Bor mainnet blocks and receipt bloom diverged
  from serial. TestV2BlockSTMEmbedded went from 0/9 to 9/9 with this
  change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread core/block_validator.go
Comment thread core/blockstm/mvhashmap.go Outdated
Comment thread core/blockstm/mvhashmap.go
Comment thread core/blockstm/mvhashmap.go
Comment thread core/blockchain.go
Comment thread core/blockstm/mvhashmap.go Outdated
cffls and others added 2 commits May 13, 2026 19:41
… drop

statedb.Copy() at core/state/statedb.go:1294 unconditionally deep-copies
the witness via stateless.Witness.Copy() — a fresh *Witness with cloned
Headers/Codes/State. V2StateProcessor.Process did readBase :=
statedb.Copy() at line 1129 and proceeded with the new readBase as the
PDB's rawBase. PDB.Witness() returns rawBase.Witness(), so any V2
worker writing through evm.StateDB.Witness().AddBlockHash (BLOCKHASH
opcode at core/vm/instructions.go:499) landed the header in the orphan
copy. finalDB.Witness() never observed it: CollectStateWitness and
CollectCodeWitness merge State and Codes back via the shared reader,
but Headers has no equivalent merge path.

Consequence: StatelessSelfValidation (blockchain.go:3546) and
single-block makeWitness producer mode (blockchain.go:3268) silently
emit BLOCKHASH-incomplete witnesses for any V2 block exercising the
opcode. Not consensus-affecting — witnesses aren't in the state or
receipt root — but cross-validation against the stateless verifier
fails for any tx that touches BLOCKHASH. Production doesn't run
StatelessSelfValidation, so the bug hasn't surfaced in the 1M-block
soak.

Fix: readBase.SetWitness(prevWitness) immediately after the Copy() so
the share-by-reference invariant the doc-comment at
parallel_statedb.go:1175 has been asserting actually holds. Update the
comment to attribute the invariant to the (now-present) restore step
instead of falsely claiming Copy() shares the witness.

TestPDB_Witness_SharedAcrossCopyViaSetWitness pins both halves: that
Copy() does NOT share by itself (the bug premise), and that after the
SetWitness restore PDB.Witness() and finalDB.Witness() are the same
*Witness object so writes via the PDB-visible reference reach the
caller's witness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tropy

bloomMask is 1<<15 - 1 = 0x7FFF — 15 bits wide. bloomHashes shifts
terms by 16 and 24 to mix four bytes into each of h1, h2, h3, but the
final `& bloomMask` discarded everything above bit 14. Concretely:

* h1's k[2]<<16 and k[3]<<24 — entirely masked
* h2's k[18]<<16 and k[19]<<24 — entirely masked
* h3's (k[10]^k[30])<<16 — entirely masked
* h3's (k[11]^k[31]^k[53])<<24 — entirely masked, INCLUDING the type
  byte k[53] that was added in 4c688e4 specifically to distinguish
  addressType / stateType / subpathType keys in the third dimension

Result: address-only and subpath keys with the same address shared
(h1, h2, h3) triples — the third-dimension separation the doc-comment
advertises did not exist. No correctness impact (false positives fall
through to the shard-map lookup), but the dead XORs ran on every
write and the bloom's selectivity was lower than the code's surface
shape suggests.

Add bloomFold(h) = (h ^ h>>15 ^ h>>30) & bloomMask so the upper bits
of each 32-bit hash term mix back into the kept low 15. Cheap (two
shifts + two XORs per hash), preserves the existing term shape, and
makes the type byte and the high-half bytes actually influence the
masked output.

TestBloomHashes_TypeByteContributes pins the originally-intended
separation between addressType and subpathType keys with the same
address. TestBloomHashes_HighBytesContribute pins that each of the
four high-slot bytes (k[3], k[19], k[11], k[31]) influences at least
one of the three hashes — pre-fix, mutating any of these four left
every (h1, h2, h3) unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread core/blockchain.go
Comment thread core/blockchain.go Outdated
Comment thread core/blockstm/mvbalance_store.go Outdated
Comment thread core/blockchain.go
cffls and others added 3 commits May 14, 2026 10:18
Five nit-level findings from the @claude bot on da0e346.

* core/blockchain.go ProcessBlock vtime closure race: both V1 and V2
  goroutines wrote the closure-captured named-return vtime after
  ValidateState. The resultChan sends synchronize each goroutine's
  write with main, but there is no happens-before edge between the
  two writes. go test -race would flag WRITE/WRITE on the
  V2-validate-fail fallback path (deterministic trigger: V2 fails
  validation, fallback waits for V1 before cancelling, V1 also runs
  through ValidateState since it does not check ctx). Move vtime
  onto the local Result struct; the named return is now assigned
  once in main from the winning Result.

* core/blockchain.go V1 prefetcher stop asymmetric to V2: V2 (4c688e4)
  stops on err != nil || ctx.Err() != nil, V1 only on ctx.Err() != nil.
  V1-fail-without-cancel leaks subfetcher goroutines across the
  caller's pathdb commit. Mirror the V2 condition.

* core/blockchain.go blockMgaspsMeter records errored blocks: the gate
  was missing result.err == nil, so a V1 success-Process + fail-Validate
  Result still updated the histogram with partial gas / wall time.
  Comment two lines above explicitly says "winning processor" — add
  the err == nil check to match.

* core/blockstm/mvbalance_store.go ZeroDelta doc-comment cited the
  deleted LastWriter method (removed in c62d8b2). Rewrite to describe
  the actual contract: the entry stays so a future WriteDelta from
  the re-executed incarnation accumulates from zero (matching
  post-Finalise semantics), and CleanupEstimate / DeleteSingle
  removes the zeroed entry afterwards if the new incarnation didn't
  re-touch the address.

* core/blockstm/mvhashmap.go ValidateVersion address-key skip rationale
  was incomplete — it cited SELFDESTRUCT only, but address keys are
  also written on account creation (statedb.createObject and
  CreateContract). V1 is no longer the production processor
  (NewParallelBlockChain wires V2), so the skip is acceptable for the
  V1 differential / mainnet-witness baseline as a false-vfail mitigation
  — but the comment was misleading. Rewrite to describe the actual
  V1-only scope, name both write sources for address keys, and point
  to V2's independent validation path for the production correctness
  guarantee.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a re-exec goroutine returns early on ctx.Done() before reaching the
x.states[idx] = st assignment, the slot stays nil. finishReexec then
unconditionally sent idx into chSettle, and the settle goroutine called
settleFn(idx, nil). The newV2SettleFn at parallel_state_processor.go did
`pdb := st.(*state.ParallelStateDB)` on a nil interface — panic with
"interface conversion: interface is nil". The settle goroutine has no
defer-recover (the validation loop has one, the settle one doesn't),
so the panic crashes the node.

Trigger: the dispatcher pushes tx i's worker; ctx is cancelled mid-block
(e.g. serial wins the parallel-vs-serial race). Worker for tx i+k hits
the toPrev select and exits on ctx.Done() before writing x.states[i+k].
The validation select for execDone[i+k] races ctx.Done() and the
randomised Go select sometimes picks execDone[i+k]. validateOne then
fails its `x.states[i] != nil && Validate()` guard and dispatches
re-exec. The re-exec goroutine also hits ctx.Done() in execute() and
returns early. finishReexec for that idx fires from a later
validateOne(j) (j > i+k); the chSettle send carries a nil-state idx;
settleFn type-asserts on nil → panic in a goroutine with no recover.

Two-layer fix:
* finishReexec gates the chSettle send on x.states[idx] != nil so
  cancellation-orphaned states don't leave the validation goroutine.
* newV2SettleFn early-returns on st == nil as defense in depth — if a
  future caller routes a nil interface here (different settle wiring,
  test harness), the assertion is still safe.

TestV2SettleFn_NilStateIsNoOp pins the consumer guard by calling
settleFn(0, nil) under defer/recover and asserting no panic, no
panickedIdx bump, no output mutation.

Found by an internal multi-agent fresh-eyes review run after the
sixth round of bot feedback to pre-empt back-and-forth — verified by
tracing the cancellation-+-randomised-select path manually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim multi-paragraph commentary added alongside the bot-review fixes
down to the minimum non-obvious WHY. Drops change-narrative, commit-SHA
references, and ALL-CAPS emphasis per team-standards code-shape rules.

No functional change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 14, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Reviewed the latest 5 commits (2026-05-14): witness sharing fix, bloom hash XOR-fold, sixth round review comments, nil PDB settle guard, and comment simplification. Verified cancel timing in ProcessBlock, nil guards in finishReexec, EthereumTransfer wiring, bloomFold math, and concurrent witness access — all correct.

Comment thread core/blockstm/mvhashmap.go Outdated
The "lazy write mode" / "lock-free index for O(1) latest-writer lookup"
comment describes a design MVHashMap doesn't implement — writes take a
per-shard RWMutex and reads walk a sorted slice. Only lock-free part is
the bloom miss-filter, which the field name already conveys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@lucca30 lucca30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the shutdown race on current head 890d5d63 and left one inline finding.

Comment thread core/blockstm/v2_executor.go Outdated
close(taskCh) raced with the dispatcher's `taskCh <- i` select on the
cancellation path — go test -race ./core -run TestExecuteV2BlockSTM_
MidFlightCancellation reliably triggers WRITE-on-closed-channel under
the race detector. Have startTaskDispatcher return a done channel and
wait on it before closing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

@pratikspatil024
Copy link
Copy Markdown
Member

Reviewed the core BlockSTM v2 logic against docs/blockstm-v2.md.

High: V2 can return while detached re-exec goroutines are still running.

dispatchReexec starts an untracked goroutine, but ExecuteV2BlockSTM only waits for the initial workers and settlement: core/blockstm/v2_executor.go:123-135, core/blockstm/v2_executor.go:566-585. On cancellation it deliberately skips draining re-execs, and on validation panic the defer path exits before the drain: core/blockstm/v2_executor.go:637-681. execute checks ctx only before entering env.Execute; once inside EVM execution there is no cancellation path: core/blockstm/v2_executor.go:400-421.

That appears to break ProcessBlock's explicit invariant that the losing processor is fully drained before commit invalidates trie/pathdb references: core/blockchain.go:954-989. Suggested fix: track re-exec goroutines in a wait group, wait on all exit paths, and gate dispatch/execution on cancellation. Ideally, also wire an interrupt into EVM execution so long-running tx execution can stop promptly.

Medium: V2 runs tracer hooks through shared VmConfig across parallel workers while also omitting tx boundary hooks.

ProcessBlock passes bc.cfg.VmConfig into V2, and every worker EVM is created from that same config: core/blockchain.go:876-879, core/parallel_state_processor.go:661-677. The parity test documents that V2 inherits EVM hooks but skips OnTxStart/OnTxEnd, and only classifies hooks rather than exercising thread-safety or ordering: core/parallel_state_processor_hooks_parity_test.go:18-29, core/parallel_state_processor_hooks_parity_test.go:77-94.

Suggested fix: route tracer-enabled processing through serial, or buffer/merge per-tx trace events deterministically and add a race/integration test.

Open question: docs/blockstm-v2.md still describes witness production as not supported/serial-routed, but this PR now wires shared witness collection through V2. Either the docs are stale, or the code is enabling a path the design doc still treats as future work.

cffls and others added 3 commits May 20, 2026 15:14
dispatchReexec launched goroutines without tracking. On the validation
cancellation path runValidationLoop skips the reexec drain ("we skip
the reexec drain on cancellation" comment), so detached re-exec
goroutines could still be running after ExecuteV2BlockSTM returns. The
caller in ProcessBlock then commits the block; a goroutine still inside
env.Execute reads through SafeBase pool DBs whose trie references have
just been invalidated by the pathdb layer advancing. Same hazard on the
validation-panic path (defer exits before any drain).

Track each re-exec goroutine in a sync.WaitGroup on v2ExecCtx and wait
on it after wg.Wait() and before settleWg.Wait(). Cancellation now
waits for in-flight re-execs to finish (bounded by EVM tx execution
time), but no detached goroutine outlives the call.

EVM-level interrupt for prompter cancellation is out of scope for this
fix; the WaitGroup makes the lifecycle safe without it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tracer hooks aren't goroutine-safe. V2 spawns concurrent workers from a
shared VmConfig (parallel_state_processor.go's preparePDB → vm.NewEVM
per worker), so a non-nil Tracer would fire from multiple goroutines
without synchronisation.

Not exercised in current Polygon production (the standard block-
processing path leaves bc.cfg.VmConfig.Tracer nil; RPC tracing uses its
own EVM via core/state_processor.ApplyTransactionWithEVM), but the door
is open if anyone enables a node-level vm tracer with V2 wired.

Return errV2TracerUnsupported at V2StateProcessor.Process entry so
ProcessBlock's existing parallel-error fallback runs V1 instead. In
enforceParallelProcessor mode the block fails — appropriate, since the
operator explicitly opted out of V1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V2 now produces witnesses via shared *Witness across all V2 statedbs
(re-shared after statedb.Copy() in V2StateProcessor.Process) plus
CollectStateWitness and CollectCodeWitness merges. The future-work
item that called this out as unimplemented and claimed V2 returns nil
from ParallelStateDB.Witness() no longer matches the code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cffls
Copy link
Copy Markdown
Contributor Author

cffls commented May 20, 2026

Thanks @pratikspatil024 ! Address these issues as follows:

High: V2 can return while detached re-exec goroutines are still running.

Addressed by adding a wait group.

Medium: V2 runs tracer hooks through shared VmConfig across parallel workers while also omitting tx boundary hooks.

tracer is already disabled during normal import and only enabled during RPC calls. In case we enable tracer during normal import, I added an explicit error that will be returned if tracer is enabled in blockstm v2.

Open question: docs/blockstm-v2.md still describes witness production as not supported/serial-routed, but this PR now wires shared witness collection through V2. Either the docs are stale, or the code is enabling a path the design doc still treats as future work.

Nice catch! Fixed the stale section in the doc.

Comment on lines +86 to +88
// Subpath identifiers — must match core/state constants (BalancePath, NoncePath, etc.)
const SubpathBalance byte = 1
const SubpathNonce byte = 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 The exported SubpathBalance and SubpathNonce constants on core/blockstm/mvhashmap.go:87-88 have zero production callers — repo-wide grep finds them only in same-package tests (mvhashmap_delta_test.go, key_accessors_test.go). Production code in core/state/statedb.go:752-760 uses its own local constants (BalancePath, NoncePath, CodePath, SuicidePath, CreatePath) at all 23+ NewSubpathKey call sites in core/state/. The "must match core/state constants" comment is unenforced — no init() check, no compile-time assertion — and the exported set is asymmetric (only Balance and Nonce; no Code/Suicide/Create equivalents), making the invariant impossible to maintain as written. Cleanest fix: lowercase to package-internal subpathBalance/subpathNonce for the test-only use, matching the dead-scaffolding cleanup pattern already accepted in 4c688e4 (BlockChain.AblationSkip*), 0ce45f0 (MVHashMap.SkipFlush), and four other rounds in this PR. Once shipped on the public core/blockstm API, removing them later becomes a SemVer-breaking change.

Extended reasoning...

What the issue is

core/blockstm/mvhashmap.go:86-88 exports two byte constants on the public core/blockstm API:

// Subpath identifiers — must match core/state constants (BalancePath, NoncePath, etc.)
const SubpathBalance byte = 1
const SubpathNonce byte = 2

A repo-wide grep for SubpathBalance|SubpathNonce returns hits in exactly three files, all in the same package:

  • core/blockstm/mvhashmap.go (the declarations)
  • core/blockstm/key_accessors_test.go (test usage: NewSubpathKey(addr, SubpathNonce))
  • core/blockstm/mvhashmap_delta_test.go (test usage: NewSubpathKey(addr, 2) plus comment reference)

A cross-package grep for blockstm.SubpathBalance / blockstm.SubpathNonce returns zero hits. No external caller exists in the tree.

Production uses different constants

Production code in core/state/statedb.go:752-760 defines its own local set:

```go
const BalancePath = 1
const NoncePath = 2
const CodePath = 3
const SuicidePath = 4
const CreatePath = 5
```

These are the constants consumed by all production MVRead / MVWrite / NewSubpathKey call sites — 23+ usages in core/state/parallel_statedb.go alone, plus every MVRead/MVWrite site in statedb.go (lines 770, 781, 814, 828, 842, 917, 944, 961, 977, 986, ...). None route through blockstm.SubpathBalance or blockstm.SubpathNonce.

Why the invariant is broken

The comment "must match core/state constants" is unenforced:

  • No init() check comparing the two sets.
  • No compile-time assertion linking them.
  • Asymmetric set: only Balance and Nonce are exported in blockstm; the core/state set has five values (Balance, Nonce, Code, Suicide, Create). A future change adjusting BalancePath = 1 or NoncePath = 2 in core/state will silently diverge from the blockstm copies without breaking anything visible.

Step-by-step proof

  1. Grep repo for SubpathBalance|SubpathNonce: matches only mvhashmap.go (declaration), key_accessors_test.go (test), mvhashmap_delta_test.go (test). Three files, all in core/blockstm/.
  2. Grep for blockstm.SubpathBalance or blockstm.SubpathNonce: zero hits.
  3. Grep core/state/statedb.go:752-760: confirms BalancePath/NoncePath/CodePath/SuicidePath/CreatePath are the production constants.
  4. Production NewSubpathKey call sites in core/state/statedb.go (lines 770, 781, 814, 828, 842, 917, 944, 961, 977, 986, ...): all use the core/state constants, never the blockstm ones.

Impact

  • API-surface commitment. core/blockstm is exported as a public API. Removing SubpathBalance or SubpathNonce later becomes a SemVer-breaking change. The cost of removing/lowercasing now is two lines.
  • Maintainer confusion. A future engineer reading the constants sees a comment claiming a "must match" invariant that isn't enforced, may not notice the asymmetry (no SubpathCode/SubpathSuicide/SubpathCreate), and could write production code routing through the wrong constants.
  • No correctness, security, fund-safety, or consensus impact. Pure public-API hygiene — hence nit severity, consistent with the verifier consensus and the six prior dead-scaffolding cleanups already accepted in this PR.

Fix

Mechanical, one-step:

  1. Lowercase the constants to package-internal: subpathBalance / subpathNonce.
  2. Update the test files to use the lowercase names.
  3. Optionally remove the misleading "must match core/state constants" comment, or replace it with: "test-only fixtures; production uses core/state.BalancePath / NoncePath."

Matches option (a) from the bug claim and aligns with the established cleanup pattern (BlockChain.AblationSkip* in 4c688e4, MVHashMap.SkipFlush in 0ce45f0, MVHashMap.BloomMayContain, MVBalanceStore.Version, MVStore.Read, MVStore.Estimates).

Extended reasoning...

Detailed reasoning is in the collapsible section of the pr_comment field above.

Comment thread core/blockchain.go
Comment on lines +939 to +946
// Stop the failed V2 statedb's prefetcher before discarding the
// result. The V2 goroutine only stops it on ctx cancellation, so a
// V2-only error path (panic, ApplyMessage error, validate mismatch)
// would otherwise leave trie prefetch work running across the
// caller's commit — exactly the stale-layer scenario this code is
// trying to avoid. Applies in both fallback (processorCount==2) and
// enforce (processorCount==1) modes.
result.statedb.StopPrefetcher()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The comment justifying the explicit result.statedb.StopPrefetcher() at core/blockchain.go:939-946 claims "The V2 goroutine only stops it on ctx cancellation," but the V2 goroutine at lines 894-896 now stops on either condition: if err != nil || ctx.Err() != nil { parallelStatedb.StopPrefetcher() }. The explicit call here is now a redundant no-op (safe due to StopPrefetcher's idempotency at core/state/statedb.go:641-647), and the rationale in the comment is wrong. Either drop the redundant line and the now-stale comment, or rewrite the comment to describe it as defense-in-depth.

Extended reasoning...

What the issue is

The block comment at core/blockchain.go:939-945 reads:

Stop the failed V2 statedb's prefetcher before discarding the result. The V2 goroutine only stops it on ctx cancellation, so a V2-only error path (panic, ApplyMessage error, validate mismatch) would otherwise leave trie prefetch work running across the caller's commit — exactly the stale-layer scenario this code is trying to avoid. Applies in both fallback (processorCount==2) and enforce (processorCount==1) modes.

That rationale was true before this PR, but the V2 goroutine in the same file at lines 888–896 now reads:

// Stop prefetcher when either (a) ctx cancelled — we lost the race,
// or (b) V2 errored out. In case (b) the fallback in ProcessBlock
// overwrites this Result with V1's and decrements processorCount,
// so the final drain at "processorCount == 2" won't fire and the
// subfetcher goroutines would leak with live trie references that
// the about-to-be-committed pathdb layer would invalidate.
if err != nil || ctx.Err() != nil {
    parallelStatedb.StopPrefetcher()
}

So the V2 goroutine already stops the prefetcher on err != nil (in addition to ctx.Err() != nil). The V1 goroutine at lines 918–920 has the same shape. The comment at lines 939–945 contradicts what the V2 goroutine actually does.

Why result.statedb.StopPrefetcher() at line 946 is harmless but redundant

StateDB.StopPrefetcher at core/state/statedb.go:641–647 is gated by if s.prefetcher != nil { ... s.prefetcher = nil }, so calling it twice is a safe no-op on the second call. The fallback block at 936–950 is entered iff result.parallel && result.err != nil — exactly the case where V2's own err != nil branch at line 894 will already have fired. By the time control reaches line 946, the V2 prefetcher has already been stopped.

Step-by-step proof

  1. V2 goroutine: bc.parallelProcessor.Process returns err != nil (panic, ApplyMessage error, ValidateState mismatch).
  2. Line 894: if err != nil || ctx.Err() != nil evaluates true via the err != nil disjunct. parallelStatedb.StopPrefetcher() runs; sets s.prefetcher = nil.
  3. V2 sends Result{parallel: true, err: <non-nil>, statedb: parallelStatedb, ...} on resultChan.
  4. Main goroutine reads V2's result. result.parallel && result.err != nil is true; enters the fallback block at 936.
  5. Line 946: result.statedb.StopPrefetcher() is called again. s.prefetcher == nil guard at statedb.go:641 short-circuits — no-op.
  6. Fallback proceeds normally.

Impact

  • No correctness, security, fund-safety, or consensus impact. The double-call is harmless thanks to idempotency.
  • Maintenance hazard. A future engineer reading the comment to understand "why two StopPrefetcher calls" will conclude V2 only stops on ctx cancel and reason about the wrong invariant. This is exactly the documentation-stale class the author has already cleaned up multiple times in this PR (e.g., MVBalanceStore.ZeroDelta's reference to the deleted LastWriter, the MVHashMap struct comment claiming a non-existent lock-free index).
  • Matches the dead-scaffolding / stale-doc cleanup pattern accepted ~10 times across this PR review under nit severity.

How to fix

Two minimal options:

  1. Drop the redundant line and the now-stale comment at 939–946. Cheapest; relies on the V2 goroutine's own StopPrefetcher at line 895.
  2. Keep the line as defense-in-depth and rewrite the comment, e.g.:
// Defensive: the V2 goroutine already stopped its prefetcher on
// err != nil at line 895; this is a redundant no-op (StopPrefetcher
// is idempotent at core/state/statedb.go:641-647) guarding against
// future changes to that path.
result.statedb.StopPrefetcher()

Either is consistent with the cleanup pattern this PR has already applied for analogous stale documentation.

… Error

V2's applyMessage panic-recover ran every tx-execution panic through
log.Error. Incarnation 0 panics are expected — speculative SSTORE-refund
accounting can underflow against a stale GetCommittedState read while
the prior tx's writes haven't settled, V2 catches the panic and the
re-exec runs against the settled state. Logging that at ERROR trains
operators tailing journalctl to ignore the signal.

Gate by pdb.Incarnation: 0 → log.Debug, ≥ 1 → log.Error (re-exec also
panicked, which is a real-bug signal that should stay loud).

TestV2ApplyMessage_FirstIncarnationPanicLogsDebug pins the contract by
injecting an OnEnter tracer that panics every call; one tx flows through
incarnations 0 and 1 and produces exactly one Debug + one Error V2-panic
record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Comment thread core/blockchain.go
Comment on lines +729 to +730
// enabled before the prefetcher runs so the underlying trieReader uses
// muSubTries throughout — switching mid-flight would race.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The doc-comment for setupBlockReaders at core/blockchain.go:729-730 references a field muSubTries that does not exist anywhere in the codebase. The actual implementation in core/state/reader.go uses a subTries sync.Map (line 258) and a concurrentEnabled bool flag (line 262). Pure documentation hygiene — fix the comment to reference the real mechanism, e.g. "...the underlying trieReader's subTries sync.Map throughout — switching the concurrentEnabled flag mid-flight would race."

Extended reasoning...

What the issue is

The doc-comment immediately above setupBlockReaders at core/blockchain.go:726-730 reads:

// setupBlockReaders builds the three StateDBs needed for parallel block
// processing: throwaway (for prefetcher), statedb (for serial processor),
// and parallelStatedb (for V2). The V2 statedb has concurrent reads
// enabled before the prefetcher runs so the underlying trieReader uses
// muSubTries throughout — switching mid-flight would race.

A repo-wide grep for muSubTries matches exactly one location: this comment itself. The symbol does not exist on trieReader, StateDB, or anywhere else in the tree.

What the actual mechanism is

core/state/reader.go:247-263 defines trieReader:

subRoots          sync.Map   // common.Address → common.Hash (storage roots)
subTries          sync.Map   // common.Address → Trie (storage tries)
lock              sync.Mutex // Lock for protecting concurrent read
accountCache      sync.Map   // addr → *types.StateAccount
storageCache      sync.Map   // addrSlot → common.Hash
concurrentEnabled bool       // if true, skip r.lock (trie uses sync.Map resolve cache)

The race-safety mechanism the comment is trying to describe is the concurrentEnabled flag plus the subTries sync.Map — calling EnableConcurrentReads() mid-block-execution (rather than once before the prefetcher starts) would race because some reads would already be in flight expecting the locked-mode path.

Why this matters

The previous prefetcher fix established a load-bearing invariant: parallelStatedb.EnableConcurrentReads() must run before the prefetcher goroutine starts at blockchain.go:865. The doc-comment is the only place that captures the rationale. A future maintainer reading it would (a) search for muSubTries, (b) not find it, and (c) lose confidence in the surrounding reasoning — possibly concluding the invariant is mythical and moving the EnableConcurrentReads call.

Step-by-step proof

  1. Grep repo-wide for muSubTries: matches only the comment at core/blockchain.go:730. Zero declarations, zero other references.
  2. Read core/state/reader.go:247-263: the actual fields are subTries sync.Map (line 258) and concurrentEnabled bool (line 262). No mu prefix anywhere — lock is a generic sync.Mutex protecting the lock-mode path, not specific to sub-tries.
  3. EnableConcurrentReads() (the operation the comment is guarding) flips concurrentEnabled from false to true; reads then bypass r.lock and use the sync.Map resolve cache instead.

Why this fits the established cleanup pattern

The author has already accepted multiple stale-doc cleanups in this PR under nit severity:

  • MVBalanceStore.ZeroDelta comment cited the deleted LastWriter (commit c62d8b2)
  • MVHashMap struct comment claimed a non-existent lock-free index
  • MVHashMap.BloomMayContain test cited the non-existent StateDB.mvReadFromHashmap
  • Several AblationSkip* dead-scaffolding cleanups

This is the same defect class — a comment naming a symbol that does not exist — and the same fix shape.

Impact

No correctness, security, fund-safety, or consensus impact. Pure documentation hygiene. Hence nit severity.

Fix

One-line comment update:

// setupBlockReaders builds the three StateDBs needed for parallel block
// processing: throwaway (for prefetcher), statedb (for serial processor),
// and parallelStatedb (for V2). The V2 statedb has concurrent reads
// enabled before the prefetcher runs so the underlying trieReader's
// subTries sync.Map is used throughout — flipping concurrentEnabled
// mid-flight would race with in-flight locked-mode reads.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly nudge: the comment at core/blockchain.go:729-730 still references muSubTries, which does not exist in the codebase. A repo-wide grep matches only this comment line itself. The actual mechanism in core/state/reader.go is subTries sync.Map (line 258) and the concurrentEnabled bool flag (line 262). One-line comment fix is all that is needed.

Comment thread core/blockchain.go
if parallelStatedb, err = state.NewWithReader(parentRoot, bc.statedb, parallel); err != nil {
return nil, nil, nil, nil, nil, nil, err
}
parallelStatedb.EnableConcurrentReads()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 parallelStatedb.EnableConcurrentReads() at core/blockchain.go:748 is called a second time inside ExecuteV2BlockSTM at core/parallel_state_processor.go:820 on the same shared trieReader (StateDB.Copy shares the reader pointer). The comment at parallel_state_processor.go:814-820 claims the second call is "defensive and idempotent" but both underlying impls are non-idempotent: trie/trie.go:76-78 unconditionally does t.resolveCache = &sync.Map{}, and core/state/reader.go:346-351 unconditionally delegates without checking r.concurrentEnabled. Two effects: (1) the second pointer overwrite races with the prefetcher goroutine that reads t.resolveCache at trie.go:208/238 — go test -race flags it on the production ProcessBlock path; (2) every cache entry the prefetcher accumulated in sync.Map #1 is silently abandoned when sync.Map #2 replaces the pointer, so V2 workers see an empty cache and re-resolve hashNodes from DB. Pure perf/race regression — no correctness or consensus impact. Fix is one line: add if t.resolveCache != nil { return } at trie.go:76 (or if r.concurrentEnabled { return } at reader.go:346), then update the comment to match reality.

Extended reasoning...

What the bug is

setupBlockReaders at core/blockchain.go:748 calls parallelStatedb.EnableConcurrentReads() synchronously, before launching the prefetcher and V2 goroutines. This unwinds through enableConcurrentOnReadertrieReader.EnableConcurrentReads (core/state/reader.go:346-351) → StateTrie.EnableConcurrentReadstrie.Trie.EnableConcurrentReads (trie/trie.go:76-78), which unconditionally assigns t.resolveCache = &sync.Map{} (sync.Map #1).

Inside the V2 goroutine, ExecuteV2BlockSTM runs base.EnableConcurrentReads() again at core/parallel_state_processor.go:820 with a comment claiming the call is "defensive and idempotent". Neither underlying implementation honours that contract:

  • trie/trie.go:76-78func (t *Trie) EnableConcurrentReads() { t.resolveCache = &sync.Map{} } — unconditional pointer overwrite.
  • core/state/reader.go:346-351 — unconditionally calls st.EnableConcurrentReads() and sets r.concurrentEnabled = true; no early-return.

Why the trieReader is shared

ReadersWithCacheStatsTriple (core/state/database.go:273-283) calls db.Reader(stateRoot) once and wraps that single Reader in three readerWithCacheStats instances. So throwaway.reader, statedb.reader, and parallelStatedb.reader all share the same underlying *trieReader and the same *Trie for the main account trie. StateDB.Copy() at core/state/statedb.go:1259 shares the reader by pointer (reader: s.reader), so readBase (V2's base, created via statedb.Copy() inside V2StateProcessor.Process) and throwaway (the prefetcher's statedb) ultimately reach the same *Trie.

Step-by-step proof of the race

  1. blockchain.go:748parallelStatedb.EnableConcurrentReads() runs synchronously. mainTrie.resolveCache = sync.Map #1. No goroutines yet.
  2. blockchain.go:849startPrefetchGoroutine launches the prefetcher goroutine. It reads through throwaway.readertrieReadermainTrie.GetgetConcurrent (trie/trie.go:203-218 and :238). On the concurrent path, the prefetcher executes if t.resolveCache != nil (line 208) and t.resolveCache.Load(cacheKey) (line 238), plus t.resolveCache.Store(...) (line 257) when populating.
  3. blockchain.go:875 — V2 goroutine starts. V2StateProcessor.ProcessreadBase := statedb.Copy() (parallel_state_processor.go:1139) shares the reader → ExecuteV2BlockSTM(..., readBase, ...)base.EnableConcurrentReads() at parallel_state_processor.go:820.
  4. Inside that second call, trie/trie.go:77 executes t.resolveCache = &sync.Map{} again. mainTrie.resolveCache = sync.Map #2 — a brand-new empty map.

The write at step 4 has no happens-before edge with the prefetcher's reads at step 2 (the prefetcher and V2 goroutines are spawned by ProcessBlock without coordination on EnableConcurrentReads). go test -race flags this on the full BlockChain.ProcessBlock path.

Two observable effects

(1) Race detector trip on production path. Per Go's memory model, plain (non-atomic) writes and reads to the same pointer field without happens-before is a textbook race. The PR's CI runs -race on the state + blockstm packages and on TestV2BlockSTMAllBlocks, but those tests call ExecuteV2BlockSTM directly without the production prefetcher concurrency, so the race window doesn't manifest there. Any future race-mode CI run that exercises ProcessBlock end-to-end will report the read at trie.go:208 racing with the write at trie.go:77.

(2) Silent cache loss. Whatever entries the prefetcher accumulated in sync.Map #1 are abandoned the moment V2 installs sync.Map #2. The prefetcher's subsequent Load/Store calls re-read t.resolveCache on each invocation (the field is not cached locally), so it then writes into sync.Map #2 — but V2 workers that read between step 4 and any new prefetcher writes see an empty cache and re-resolve hashNodes from DB. PrevalueTracer (witness state) is unaffected because PrevalueTracer.Put has its own internal lock and isn't tied to resolveCache.

Why existing tests don't catch it

  • TestV2BlockSTMAllBlocks (core/mainnet_witness_benchmark_test.go) and the FuzzV2ExecutorVsSerial fuzz call ExecuteV2BlockSTM directly, bypassing BlockChain.ProcessBlock's concurrent prefetcher.
  • The 1M-block production soak runs without -race and is insensitive to perf-only regressions of this magnitude.
  • The PR description notes that the race-detected fuzz "caught the shared-trie-reader race that the non-race fuzz missed" — but that fuzz targets the executor in isolation, not the production wiring where this specific double-call lives.

Impact

No consensus, correctness, or fund-safety impact — resolveCache is purely a perf optimization, and a re-resolve from DB returns the same content. But:

  • The "defensive and idempotent" comment at parallel_state_processor.go:814-820 is materially wrong; the implementation is not idempotent.
  • The race itself is real per Go's memory model and trips -race on the production path.
  • The prefetcher's warming work is silently negated for every V2 block.

How to fix

One-line idempotency guard in either of the two non-idempotent sites:

// trie/trie.go:76 — preferred (lowest layer, smallest change)
func (t *Trie) EnableConcurrentReads() {
    if t.resolveCache != nil {
        return
    }
    t.resolveCache = &sync.Map{}
}

or

// core/state/reader.go:346
func (r *trieReader) EnableConcurrentReads() {
    if r.concurrentEnabled {
        return
    }
    ...
    r.concurrentEnabled = true
}

Either fix restores the documented contract. The comment at parallel_state_processor.go:814-820 should also be updated since "idempotent" is currently false.

Comment thread core/blockchain.go
Comment on lines +967 to +968
// Value is scaled by 1000 (stored as µgasps) to preserve 3 decimal places,
// e.g. 210.357 mgasps → 210357. Divide by 1000 when reading.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Doc comment at core/blockchain.go:967-968 and inline comment at line 976 label the stored unit as µgasps, but the scaling is ×1000 (kilo), not ×10⁻⁶ (micro). The SI prefix is in the wrong direction — given the convention that 'mgasps' here means megagas/sec (10⁶ gas/sec), the stored value is actually milli-mgasps (i.e. kilogas/sec). The inline comment 'µgasps (mgasps * 1000)' is self-contradictory (µ means ×10⁻⁶, not ×1000). Pure documentation nit with no correctness, consensus, or runtime impact — the computed value is correct and the example '210.357 mgasps → 210357' makes intent clear. Suggested fix: replace 'µgasps' with 'scaled mgasps × 1000' or 'milli-mgasps' in both comments.

Extended reasoning...

What the issue is

The doc comment block at core/blockchain.go:967-968 reads:

// Report per-block mgasps for the winning processor.
// Value is scaled by 1000 (stored as µgasps) to preserve 3 decimal places,
// e.g. 210.357 mgasps → 210357. Divide by 1000 when reading.

And the inline comment at line 976 reads:

mgasps := int64(float64(result.usedGas) * 1e6 / float64(elapsed)) // µgasps (mgasps * 1000)

The 'µ' SI prefix means micro (10⁻⁶), but the actual scaling is ×1000 (kilo, 10³) — opposite direction. The inline parenthetical 'µgasps (mgasps * 1000)' is self-contradictory: µ means ×10⁻⁶, while the parenthetical says ×1000.

Dimensional analysis

time.Duration is int64 nanoseconds. The formula usedGas × 1e6 / elapsed_ns:

  • gas/sec = usedGas × 10⁹ / elapsed_ns
  • mgasps (= megagas/sec = 10⁶ gas/sec) = usedGas × 10³ / elapsed_ns
  • The code computes usedGas × 10⁶ / elapsed_ns = 1000 × mgasps

So stored_value / 1000 = mgasps, confirming the example '210.357 mgasps → 210357'. The stored unit is therefore 1/1000 of a mgasps = milli-mgasps = kilogas/sec — NOT µgasps (which would be 10⁻⁶ × gas/sec, off by 12 orders of magnitude in the opposite direction).

Step-by-step proof

  1. Take usedGas = 12_000_000 (12M gas, typical block) and elapsed = 57_000_000 ns (57ms).
  2. Real throughput: 12_000_000 / 0.057 = ~210.5 million gas/sec = ~210.5 mgasps.
  3. Stored value: int64(12_000_000 × 1e6 / 57_000_000) = int64(210526.3) = 210526.
  4. 210526 / 1000 = 210.526 mgasps ✓ (the comment's example pattern).
  5. If a reader took 'µgasps' literally as 10⁻⁶ × gas/sec, they'd interpret 210526 as 0.210 gas/sec — off by 9 orders of magnitude.

Addressing the refutation

One verifier refuted on the grounds that 'mgasps' is itself non-SI (lowercase m used colloquially for mega, not milli) and that 'µ' here follows the same informal convention. While the convention argument has merit, the inline annotation µgasps (mgasps * 1000) is internally contradictory under any reading: µ as a notation always means "smaller than", but the parenthetical says "1000× larger than mgasps". The example resolves the ambiguity, but the unit name itself misleads.

This also matches the dead-doc / stale-comment cleanup pattern already accepted multiple times in this PR (LastWriter reference in MVBalanceStore.ZeroDelta, muSubTries reference in setupBlockReaders, the MVHashMap struct comment claiming a non-existent lock-free index, BloomMayContain test's reference to non-existent mvReadFromHashmap).

Impact

  • No correctness, consensus, fund-safety, or runtime impact. The metric value is computed correctly.
  • Documentation hygiene only. Operators reading the source to interpret the metric would be misled by the unit name, though the example clarifies intent. Dashboard labels are configured separately.
  • Severity nit, matching the established stale-doc cleanup pattern in this PR.

Fix

One-line edit in both locations:

// Value is scaled by 1000 (stored as milli-mgasps) to preserve 3 decimal places,
// e.g. 210.357 mgasps → 210357. Divide by 1000 when reading.

And the inline:

mgasps := int64(float64(result.usedGas) * 1e6 / float64(elapsed)) // milli-mgasps (mgasps * 1000)

Or alternatively drop the unit name entirely and keep just the scaling factor in the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants