Skip to content

test(DON'T MERGE): develop-v2.1.0-rvr#2790

Draft
shuklaayush wants to merge 22 commits into
develop-v2.0.0-rc.1from
develop-v2.1.0-rvr
Draft

test(DON'T MERGE): develop-v2.1.0-rvr#2790
shuklaayush wants to merge 22 commits into
develop-v2.0.0-rc.1from
develop-v2.1.0-rvr

Conversation

@shuklaayush
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

This comment was marked as outdated.

@shuklaayush shuklaayush added run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e labels May 16, 2026
@github-actions

This comment has been minimized.

@jpw-axiom jpw-axiom force-pushed the develop-v2.0.0-rc.1 branch from f08f0fe to 0110850 Compare May 18, 2026 18:51
shuklaayush and others added 14 commits May 18, 2026 21:32
makes the basic framework for using rvr extensions in openvm

- adds an rvr feature flag
- defines `RvrExtensionCtx` struct to provide mappings between
opcode/executor/air indices
- defines `VmRvrExtension` trait that extensions can implement to be
registered
- updated macro so that rvr `ExtensionRegistry` can be auto-generated in
`SdkVmConfig`

closes INT-7474, INT-7475, INT-7479
#2730)

Moves the rvr files related to compiling and execution into
openvm-circuit. Those rvr files previously depended on openvm-circuit
and in order to enable rvr execution through the openvm pipeline, they
had to be made a part of openvm-circuit to prevent circular
dependencies.

closes INT-7537
- Vm execution instance is made to use rvr execution, depending on the
feature. Helper functions to convert between the existing `VmState` and
the rvr state are also added.
- The `VmConfig` macro now has a `create_rvr_extensions` method
implementation, but instead of defining a new `VmRvrConfig` trait, the
`create_rvr_extensions` method piggybacks on the existing
`VmExecutionConfig`. This is to avoid complex feature-gated trait
bounds.

closes INT-6810, INT-7476
Enables running benchmarks through rvr extension. Benchmark tests do not check execution correctness and currently execution involving extensions other than RV32IM fail because `VmRvrExtension` trait implementation is not properly wired.

closes INT-7480
Previously rvr execution had to use `executor_idx_to_air_idx` information in order to construct `ExtensionRegistry`. This was a problem for pure execution which didn't need air indices so the interface diverged between rvr and aot/interpreted. Now for rvr pure execution, dummy index values of `NO_CHIP` are used instead to keep the interface consistent.

towards INT-7611
Removes the rvr tests and instead adds rvr comparison steps in existing openvm tests in a similar way to aot. Unlike aot, metered cost execution is also run and compared for rvr and interpreted modes.

closes INT-7627
- Introduces a new `Rv32IoExtension` in rvr that handles the rv32io
instructions (hint_storew, hint_buffer, reveal). This is mainly to have
a struct managing the hint_store chip index.
- Adds rvr tests to the CI file in the same way as aot.

closes INT-7466
Implements the rvr feature for the keccak256 extension and also adds rvr tests to CI. Now extensions don't take a `staticlib_path` argument manually and instead uses the auto-built staticlib made by a build.rs file.

closes INT-7468
Implements the rvr feature for the Algebra extension. The rvr side of the Algebra extension is now also split into `ModularRvrExtension` and `Fp2RvrExtension`. A notable change is to have the C code for the Algebra extension which uses `libsecp256k1` to also unconditionally contain the C code needed in the ECC extension, since they are closely related and doing so would avoid configuration dependencies.

closes INT-7470, INT-7704
Implements the rvr feature for all extensions that are left - BigInt, Sha2, ECC, Pairing, Deferral. Code for tests and CI are also updated. Changes for the Deferral extension includes additions to the VM state used in rvr execution.

closes INT-7465
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Changed rvr execution to use the existing openvm `VmState` instead of
defining a new state struct and copying data between the two forms. The
references and pointers to the fields in `VmState` are passed to the rvr
execution functions so they can be used in C code.
- The Deferral extension now uses a callback registration system to
expose the Deferral related data to C code instead of piggybacking on
the same mechanism of `OpenVmHostCallbacks`. This is enabled for each
extension so that they can have per-extension data and state that is
maintained separately.

closes INT-7572
@shuklaayush shuklaayush force-pushed the develop-v2.1.0-rvr branch from db2dea9 to c8a2092 Compare May 18, 2026 19:32
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Claude finished @shuklaayush's task in 10m 14s —— View job


Code Review: PR #2790 — RVR Integration (Re-review on sync)

Reviewed ~15.7k additions across 175 files introducing RVR (RISC-V Runtime) compiled execution support.

  • Gather context — diff against base, understand new commits
  • Review core VM integration (crates/vm/src/arch/rvr/)
  • Review RVR crates (crates/rvr/)
  • Review extension RVR support (algebra, bigint, etc.)
  • Review CI/build changes
  • Post comprehensive review

Critical / High Severity

1. Out-of-bounds dispatch table access for dynamic jumps
crates/rvr/rvr-openvm/src/emit/codegen.rs:142-145, project.rs:330-336

rv_execute bounds-checks state->pc against [text_start, max_pc] before the first dispatch (project.rs:662-667). However, JumpDyn terminators (codegen.rs:144) and unresolved branch targets (codegen.rs:160) call dispatch_table[dispatch_index(next_pc)] without bounds checking. dispatch_index computes (pc - text_start) >> 2 — if a guest program computes a jump target below text_start, this unsigned subtraction wraps, producing a very large index. Targets above max_pc also read past the table end. The rv_trap fallback only exists in the entry point, not at internal dispatch sites.

Additionally, max_pc is computed from blocks.iter().map(|b| b.start_pc) only (project.rs:629-634). Branch targets or resolved jump destinations that exceed the highest block start PC would also be out-of-bounds.

Fix this →

2. Missing memory bounds checks in IO callbacks
crates/vm/src/arch/rvr/io.rs:66-74, 89-101, 105-119

Guest programs can trigger out-of-bounds host memory access through several callbacks:

  • host_print_str (line 70): io.memory_ptr.add(ptr as usize) with no check that ptr + len <= MEM_SIZE
  • host_hint_storew (line 99): io.memory_ptr.add(dest_addr as usize) — no check dest_addr + 4 <= MEM_SIZE
  • host_hint_buffer (line 115): io.memory_ptr.add(dest_addr as usize)dest_addr + nbytes could exceed allocation

The GuardedMemory guard page (16 KB) at the end catches small overflows with a segfault, but overflows beyond 16 KB bypass it entirely. An explicit bounds check before each pointer arithmetic would be safer.

Fix this →

3. assert! in extern "C" callback — undefined behavior
crates/vm/src/arch/rvr/io.rs:133-137

host_reveal uses assert!() inside an extern "C" function. Panicking in an extern "C" function is undefined behavior — the unwinding will cross the C FFI boundary. This should use std::process::abort() after logging, or be wrapped in catch_unwind. The .unwrap() calls in host_hint_storew (line 96) and host_hint_buffer (line 117) have the same risk, though they're protected by preceding length checks.

Fix this →

4. NULL function pointer dereference in C IO layer
crates/rvr/rvr-openvm/c/openvm_io.c:32-49

g_host is a zero-initialized global. If any openvm_* function is called before register_openvm_callbacks(), it dereferences a null function pointer (e.g., g_host.hint_input(g_host.ctx) on line 32). No null guards exist on any forwarding stub.

5. Global mutable state for host callbacks is not thread-safe
crates/rvr/rvr-openvm/c/openvm_io.c:24

static OpenVmHostCallbacks g_host; is process-global with no synchronization. The comment acknowledges this ("NOT thread-safe"), but if two RVR instances run concurrently (e.g., parallel proving, parallel tests), they stomp each other's callbacks. Consider _Thread_local / __thread storage, or passing callbacks through RvState.

The same pattern appears in extensions/deferral/rvr/c/rvr_ext_deferral.c:21 with static DeferralHostCallbacks g_deferral.

6. riscv.yml missing clang-22/lld installation
.github/workflows/riscv.yml

The new rvr matrix entry (lines 37-41) doesn't install clang-22 + lld, unlike extension-tests.yml and guest-lib-tests.yml which correctly do. The RVR codegen invokes clang-22 at runtime, so rvr tests will fail with "command not found".

Fix this →


Medium Severity

7. on_termination instret calculation can underflow
crates/vm/src/arch/rvr/metered.rs:608

let elapsed = self.config().segment_check_insns - remaining_counter as u64;

After metered_periodic_check fires (metered.rs:661), check_counter += segment_check_insns. Then trace_block subtracts block_insn_count. At termination, remaining_counter can be larger than segment_check_insns (it was bumped by the callback and only partially decremented). This subtraction wraps, producing a huge elapsed value and corrupting the instret accounting.

8. trace_block counter underflow in C tracer
crates/rvr/rvr-openvm/c/openvm_tracer_metered.h:215-222

When check_counter < block_insn_count, the callback adds segment_check_insns to check_counter, then line 221 unconditionally subtracts block_insn_count. If a single basic block contains more instructions than the post-callback counter value, this wraps to a huge u32, effectively disabling all future periodic checks. The max_block_insns cap (1000, matching the default segment_check_insns) makes this unlikely in practice, but the invariant isn't enforced with a static assert.

9. u64 to u32 truncation of segment_check_insns
crates/vm/src/arch/rvr/metered.rs:661, execute.rs:259

segment_check_insns is u64 but cast to u32 with as u32. The default (1000) is safe, but user-configurable values above u32::MAX would silently truncate. Use try_into().unwrap() or add a debug assertion.

10. u32s_as_u64s alignment check is debug-only
crates/rvr/rvr-openvm-ffi-common/src/lib.rs:167-176

The alignment assertion is debug_assert!, stripped in release builds. Misaligned input produces UB via from_raw_parts. Since the function is pub, consider promoting to assert! or using align_to which handles alignment safely.

11. TracerPtr<T> deref of null pointer only checked in debug
crates/vm/src/arch/rvr/state.rs:51-69

TracerPtr<T>::deref() and deref_mut() use debug_assert!(!self.0.is_null()). In release builds, dereferencing before assignment is instant UB with no diagnostic.

12. UnknownComplexField::div hardcodes norm as a²+b²
extensions/algebra/rvr/ffi/fp2/src/lib.rs:146-155

The norm computation b.0² + b.1² is correct only when the Fp2 non-residue is -1 (i.e., Fp2 = Fp[x]/(x²+1)). This is true for BN254 and BLS12-381, but the function is a generic fallback for unknown moduli. If the non-residue differs, division gives wrong results. At minimum, document this assumption.

13. Fragile chip index +1 offset assumptions
extensions/deferral/rvr/src/lib.rs:139-143, extensions/keccak256/rvr/src/lib.rs:110-114, extensions/sha2/rvr/src/lib.rs:131-135

Multiple extensions derive a secondary chip index as primary_chip_idx + 1, relying on the circuit extension's AIR allocation order. If extend_circuit changes ordering, these relationships break silently. Consider explicit lookups or construction-time assertions.

14. __builtin_assume_aligned UB for misaligned guest memory
crates/rvr/rvr-openvm/c/openvm_state.h:95-113

rd_mem_u16, rd_mem_u32, wr_mem_u16, wr_mem_u32 use __builtin_assume_aligned to promise alignment. If a guest does a misaligned load, this is UB in C. If OpenVM enforces alignment, this should be documented; otherwise use memcpy.


Low Severity / Code Quality

15. Redundant HashMap<u32, u32> maps PC to itself
crates/rvr/rvr-openvm/src/emit/project.rs:639-640

let block_starts: HashMap<u32, u32> = blocks.iter().map(|b| (b.start_pc, b.start_pc)).collect();

This is just a HashSet. Use HashSet::contains() instead.

16. Duplicate decode_reg function
crates/rvr/rvr-openvm-lift/src/helpers.rs:5-7 and crates/rvr/rvr-openvm-lift/src/opcode.rs:223-225 define identical functions. Also duplicated in extensions/bigint/rvr/src/lib.rs:319-321 despite depending on rvr-openvm-lift.

17. execution_status() maps unknown values to Running
crates/rvr/rvr-state/src/state.rs:123-131

Any has_exited value outside {0,1,2,3} silently maps to ExecutionStatus::Running, hiding memory corruption.

18. Recompilation on every metered execution (known TODO)
crates/vm/src/arch/rvr/metered.rs:748-755, metered_cost.rs:221-228

Both execute_metered and execute_metered_cost compile the shared library on every invocation. The TODO comments acknowledge this.

19. Near-identical execute_* functions
crates/vm/src/arch/rvr/execute.rs

Multiple execute_* functions share nearly identical setup code. A builder pattern or parameterized function would reduce maintenance burden.

20. Feature-gated fields on public Streams struct
crates/vm/src/arch/vm.rs:117-119

deferral_fns and deferral_hash are #[cfg(feature = "rvr")] on a public struct, changing layout by feature flag. Fragile for downstream crates.

21. RvrCompiled field drop order sensitivity
crates/vm/src/arch/rvr/compile.rs:21-26

lib (the loaded .so handle) must be dropped before temp_dir (which deletes the .so). Current declaration order is correct, but reordering fields would cause use-after-free. Consider a comment or ManuallyDrop.

22. C23 features limit portability
extensions/keccak256/rvr/c/rvr_ext_keccak.c:11-13

Uses constexpr and static_assert (C23). Fine with clang-22, but limits portability.


Positive Patterns

  • ABI compile-time checks (abi_consts.rs): const assertions ensuring Rust/C layout consistency
  • NQR seed sharing via openvm-algebra-utils: Circuit and RVR use identical NQR_RNG_SEED with same iteration order
  • Feature flag chain correctly wired: cargo-openvm/rvropenvm-sdk/rvr → per-extension circuit crates
  • C endianness guard (rvr_ext_modular.c): #error for non-little-endian hosts
  • Write materialization pattern (context.rs:137-149): Prevents double-evaluation of C expressions
  • Guard page protection (GuardedMemory): Maps PROT_NONE page at end of memory for basic overflow detection
  • Consistent extension pattern: All RVR extensions follow the same trait structure with new_pure()/new() split
  • Cargo.lock is clean: All new packages are workspace-internal path dependencies
  • Submodules properly registered: .gitmodules correctly has secp256k1, xkcp, and openssl entries; CI workflows correctly init them conditionally
  • Source attribution: Generated C code includes #line directives mapping back to guest source files

Summary

The architecture is well-designed — the adapter/core split, extension trait pattern, and C codegen pipeline are solid. The main concerns are:

  1. FFI safety: Panics across boundary (UB), missing bounds checks in IO callbacks, null function pointers
  2. Dispatch table safety: Out-of-bounds access from internal JumpDyn targets (no bounds check)
  3. Instret accounting: Potential underflow in on_termination when remaining_counter > segment_check_insns
  4. CI gap: riscv.yml missing clang-22 install for rvr variant
  5. Thread safety: Global mutable callback state prevents concurrent RVR instances

Priority fixes:

  1. Add bounds checking at internal dispatch sites (or size table to cover all possible targets)
  2. Add memory bounds checks in IO callbacks before pointer arithmetic
  3. Replace assert! with abort-safe error handling in extern "C" callbacks
  4. Fix on_termination instret calculation to handle remaining_counter > segment_check_insns
  5. Add clang-22 install step in riscv.yml

@github-actions

This comment has been minimized.

- Makes rvr metered execution use the existing `SegmentationCtx` of openvm instead of its own new structs and code. This fixes the discrepancy between rvr and interpreted/aot segmentation logic and resolves the issue of rvr making too many segments. (https://github.com/axiom-crypto/openvm-eth/actions/runs/26112543464)
- Fixes the calculation of `num_insns` in segments that are used as segment boundaries. The instruction counts were recorded as multiples of `segment_check_insns` (1000) that didn't map to the actual basic block boundaries. Addresses the problem of overflowing GPU memory. (https://github.com/axiom-crypto/openvm-eth/actions/runs/26127167098)

closes INT-7835
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Claude finished @gdmlcjs's task in 9m 59s —— View job


Code Review: PR #2790 — RVR Integration (Re-review on sync)

Reviewed ~15.5k additions across 176 files. The latest commit (0ea17b6) is a significant improvement — it replaces the duplicated MeteredConfig/RvrSegment/BitSet types with reuse of SegmentationCtx from the existing OpenVM codebase, eliminating ~300 lines of duplicated segmentation logic. The previous review's items #7 (instret underflow) and #18 (code deduplication) are partially addressed.

  • Gather context — diff against base, understand new commits
  • Review core VM integration (crates/vm/src/arch/rvr/)
  • Review RVR codegen and lifting pipeline (crates/rvr/)
  • Review extension RVR support and C FFI code
  • Review CI/build and config changes
  • Verify instret accounting correctness after refactor
  • Post comprehensive review

Changes in Latest Commit (0ea17b6)

The refactoring is clean and correct:

  • MeteredConfig → eliminated; config values now come from SegmentationCtx and SystemConfig
  • RvrSegment → eliminated; uses Segment from segment_ctx.rs
  • BitSet → moved to memory_ctx.rs (shared with non-RVR path)
  • should_segment / create_segment → delegated to SegmentationCtx::check_and_segment
  • New shared methods on SegmentationCtx: warn_if_exceeds_limits, create_final_segment, initialize_segment, update_checkpoint — all correctly factored

The instret accounting math is correct in release mode (verified algebraically: total instret = Σ(block costs) via wrapping arithmetic), but there's a remaining debug-mode issue (see #1 below).


Critical / High Severity

1. create_final_segment can underflow in debug mode (previously #7, partially addressed)
crates/vm/src/arch/execution_mode/metered/segment_ctx.rs:532

self.instret += self.segment_check_insns - self.instrets_until_check;

When execution terminates shortly after a periodic check callback, check_counter (which becomes instrets_until_check) can exceed segment_check_insns. Example: callback fires with remaining=1, check_counter += 1000 → 1001, program terminates immediately → instrets_until_check = 1001 > segment_check_insns = 1000. The subtraction underflows.

In release mode, wrapping arithmetic gives the correct total (verified), but in debug mode this panics. Since create_final_segment is now shared infrastructure used by both RVR and non-RVR paths, this should use wrapping_sub or wrapping_add explicitly.

Fix this →

2. Out-of-bounds dispatch table access for dynamic jumps (unchanged from previous review)
crates/rvr/rvr-openvm/src/emit/codegen.rs:142-145, project.rs:628-636

rv_execute's entry point bounds-checks state->pc against [text_start, max_pc], but JumpDyn terminators call dispatch_table[dispatch_index(next_pc)] without bounds checking. dispatch_index computes (pc - text_start) >> 2 — if a guest program jumps below text_start, the unsigned subtraction wraps to a huge index. Additionally, max_pc only considers block start PCs, not end PCs or branch targets.

Fix this →

3. Missing memory bounds checks in IO callbacks (unchanged from previous review)
crates/vm/src/arch/rvr/io.rs:66-74, 89-101, 105-119

Guest programs can trigger out-of-bounds host memory access:

  • host_print_str (line 70): io.memory_ptr.add(ptr as usize) — no ptr + len <= MEM_SIZE check
  • host_hint_storew (line 99): no dest_addr + 4 <= MEM_SIZE check
  • host_hint_buffer (line 115): no dest_addr + nbytes <= MEM_SIZE check

The 16 KB guard page catches small overflows, but larger overflows bypass it entirely.

Fix this →

4. Panics in extern "C" callbacks (unchanged from previous review, scope widened)
crates/vm/src/arch/rvr/io.rs:133-137, extensions/algebra/rvr/ffi/*/src/lib.rs

Multiple extern "C" functions contain .unwrap() / assert!():

  • host_reveal (io.rs:133): assert!()
  • rvr_ext_fp2_div_* (fp2/src/lib.rs:66, 105): b.invert().unwrap() — panics on zero divisor
  • rvr_ext_mod_div_* (modular/src/lib.rs:72, 106): same pattern
  • rvr_ext_algebra_hint_sqrt (modular/src/lib.rs:346): .expect()

On Rust 1.91.1 these abort rather than cause UB, but aborting the entire host process on bad guest input is a harsh failure mode. Consider catch_unwind or explicit error handling.

5. NULL function pointer dereference in C IO layer (unchanged from previous review)
crates/rvr/rvr-openvm/c/openvm_io.c:32-49

All forwarding stubs (openvm_hint_input, openvm_print_str, etc.) dereference g_host.<callback> without null guards. If any stub is called before register_openvm_callbacks(), it dereferences a null function pointer.

6. riscv.yml missing clang-22/lld installation (unchanged from previous review)
.github/workflows/riscv.yml

The rvr matrix entry doesn't install clang-22 + lld, unlike extension-tests.yml (lines 95-101) and guest-lib-tests.yml (lines 87-93) which correctly do. RVR tests will fail with "command not found".

Fix this →


Medium Severity

7. Global mutable state prevents concurrent RVR instances (unchanged)
crates/rvr/rvr-openvm/c/openvm_io.c:24, extensions/deferral/rvr/c/rvr_ext_deferral.c:21

g_host is process-global with no synchronization. Parallel tests or parallel proving will stomp callbacks. Consider _Thread_local / __thread storage.

8. u32s_as_u64s alignment check is debug-only
crates/rvr/rvr-openvm-ffi-common/src/lib.rs:167-176

The alignment assertion is debug_assert!, stripped in release. Misaligned input produces UB via from_raw_parts. Since the function is pub, promote to assert! or use align_to.

9. TracerPtr<T> null check is debug-only
crates/vm/src/arch/rvr/state.rs:51-69

deref() / deref_mut() use debug_assert!(!self.0.is_null()). In release, dereferencing before assignment is instant UB.

10. UnknownComplexField arithmetic hardcodes non-residue = -1
extensions/algebra/rvr/ffi/fp2/src/lib.rs:141-155

Both mul and div assume Fp2 = Fp[i]/(i²+1). The norm b₀²+b₁² is only correct when the non-residue is -1. This holds for BN254 and BLS12-381 but will silently produce wrong results for extensions with different non-residues. At minimum, document this constraint.

11. __builtin_assume_aligned on potentially unaligned guest memory
crates/rvr/rvr-openvm/c/openvm_state.h:95-113

rd_mem_u32, wr_mem_u32 etc. use __builtin_assume_aligned. If a guest does misaligned access, this is UB. On x86-64 this works in practice, but on aarch64 with strict alignment it could SIGBUS.

12. C tracer trace_block counter underflow (unchanged)
crates/rvr/rvr-openvm/c/openvm_tracer_metered.h:215-222

After on_check resets check_counter, the unconditional check_counter -= block_insn_count can wrap to a huge u32 if a single block exceeds the post-callback counter value. In practice, max_block_insns (1000) matches segment_check_insns (1000), but the invariant isn't enforced.

13. CFG worklist iteration limit reached silently
crates/rvr/rvr-openvm-lift/src/cfg.rs:572-580

When the worklist exceeds instructions.len() * 20 iterations, analysis terminates silently. Incomplete CFG analysis could lead to missing blocks and runtime traps for valid jump targets. Log a warning when the limit is hit.

14. decode_imm_cg(insn) as i16 truncation for load/store offsets
crates/rvr/rvr-openvm-lift/src/opcode.rs:323, 345

The as i16 cast silently truncates values outside [-32768, 32767]. Compare with lift_jalr (line 393) which correctly uses as i32. Use i32 consistently or add a debug assertion.


Low Severity / Code Quality

15. execution_status() maps unknown values to Running (unchanged)
crates/rvr/rvr-state/src/state.rs:123-131 — silently hides corruption.

16. Redundant HashMap<u32, u32> maps PC to itself (unchanged)
crates/rvr/rvr-openvm/src/emit/project.rs:639-640 — should be a HashSet.

17. Duplicate decode_reg function (unchanged)
crates/rvr/rvr-openvm-lift/src/helpers.rs:5-7 and opcode.rs:223-225, also extensions/bigint/rvr/src/lib.rs:319-321.

18. RvrCompiled field drop order sensitivity (unchanged)
crates/vm/src/arch/rvr/compile.rs:21-26lib must drop before temp_dir. Current order is correct but fragile.

19. No tco + rvr mutual exclusion guard
crates/vm/src/lib.rstco+aot and rvr+aot have compile_error guards, but tco+rvr does not. Verify these features are compatible.


What Improved Since Last Review

  1. Segmentation logic deduplication (issue Feat/xor chip #18): MeteredConfig, RvrSegment, BitSet, should_segment, create_segment all replaced by shared SegmentationCtx methods. ~300 lines removed.
  2. instret accounting (issue chore: adding (pk, vk) to triples_fib integration test #7): Now delegates to SegmentationCtx::create_final_segment, which is shared with the interpreter path. The math is correct in release mode but still has the debug-underflow risk.
  3. Code organization: SegmentationState::new() now takes MeteredCtx + SystemConfig directly, avoiding the intermediate config struct.
  4. Shared warning logic: warn_if_exceeds_limits and format_nonzero_trace_heights extracted as shared methods.

Positive Patterns

  • ABI compile-time checks (abi_consts.rs): const assertions ensuring Rust/C layout consistency
  • Feature flag chain correctly wired: cargo-openvm/rvropenvm-sdk/rvr → per-extension circuit crates
  • VmConfig derive macro correctly generates #[cfg(feature = "rvr")]-gated code
  • SegmentationCtx refactoring is well-structured with clear method boundaries
  • Guard page protection (GuardedMemory): PROT_NONE page at end of memory
  • Source attribution: Generated C code includes #line directives
  • Cargo.lock is clean: All new packages are workspace-internal path dependencies

Summary

The latest refactoring significantly improves the codebase by eliminating duplicated segmentation logic. The architecture remains solid.

Priority fixes:

  1. Fix create_final_segment underflow — use wrapping arithmetic or adjust the accounting so instrets_until_check never exceeds segment_check_insns
  2. Add bounds checking at JumpDyn dispatch sites
  3. Add memory bounds checks in IO callbacks before pointer arithmetic
  4. Replace panicking code in extern "C" callbacks with abort-safe error handling
  5. Add clang-22 install step in riscv.yml

@github-actions

This comment has been minimized.

moves the rvr compilation stage for metered and metered cost out of execution and into instance construction

closes INT-7626
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Claude finished @gdmlcjs's task in 4m 23s —— View job


Code Review: PR #2790 — RVR Integration (Re-review after compile-stage refactor)

Reviewed ~15.4k additions across 176 files. The latest commit (43e0f72) moves RVR compilation into instance construction, removing the Executor type parameter from RvrMeteredInstance and RvrMeteredCostInstance. This is a clean simplification — instances now own their RvrCompiled directly instead of compiling on each execute_* call.

  • Gather context — diff against base, understand new commits since last review
  • Review core VM integration (crates/vm/src/arch/rvr/)
  • Review RVR crates (crates/rvr/)
  • Review extension RVR support and C FFI code
  • Review CI/build changes
  • Check status of previously identified issues
  • Post comprehensive review

What Improved Since Last Review

  1. Compilation moved to instance construction (43e0f72): RvrMeteredInstance and RvrMeteredCostInstance no longer carry inventory and executor_idx_to_air_idx; they own a pre-compiled RvrCompiled directly. This eliminates the per-execution recompilation noted in prior reviews and removes the Executor type parameter, simplifying the type aliases (RvrMeteredInstance2, RvrMeteredCostInstance2 are gone).

  2. Segmentation logic deduplication (0ea17b6): MeteredConfig, RvrSegment, BitSet replaced by shared SegmentationCtx — ~300 lines removed.

  3. metered_cost_instance now takes widths (43e0f72): get_metered_cost_rvr_instance properly extracts widths from the proving key and passes them through, so cost accounting uses the correct AIR widths.


Critical / High Severity

1. Out-of-bounds dispatch table access for dynamic jumps (unchanged)
crates/rvr/rvr-openvm/src/emit/codegen.rs:142-145, project.rs:628-636

rv_execute entry bounds-checks state->pc against [text_start, max_pc] (project.rs:662), but JumpDyn terminators (codegen.rs:144) call dispatch_table[dispatch_index(next_pc)] without any bounds check. The next_pc is computed from guest-controlled register rs1, so a malicious or buggy guest program that sets rs1 to a value outside [text_start, max_pc] will access the dispatch table out-of-bounds. Similarly, unresolved branch targets (codegen.rs:160) go through the dispatch table without validation.

Additionally, max_pc only considers block start PCs (project.rs:629-634), not the last instruction within blocks or branch targets beyond the highest block.

Fix this →

2. Missing memory bounds checks in IO callbacks (unchanged)
crates/vm/src/arch/rvr/io.rs:66-74, 89-101, 105-119

Guest programs can trigger out-of-bounds host memory access through several callbacks:

  • host_print_str (line 70): io.memory_ptr.add(ptr as usize) — no ptr + len <= MEM_SIZE check
  • host_hint_storew (line 99): io.memory_ptr.add(dest_addr as usize) — no dest_addr + 4 <= MEM_SIZE check
  • host_hint_buffer (line 115): io.memory_ptr.add(dest_addr as usize) — no dest_addr + nbytes <= MEM_SIZE check

The 16 KB guard page catches small overflows, but overflows beyond 16 KB bypass it entirely. Consider storing MEM_SIZE in OpenVmIoState and validating before each unsafe access.

Fix this →

3. assert! in extern "C" callback — unwinding across FFI (unchanged, scope widened)
crates/vm/src/arch/rvr/io.rs:133-137

host_reveal uses assert!() inside an extern "C" function (line 133). On Rust 1.91+ with panic = "unwind" this will abort, but it's still poor practice and the abort message may be unclear. Additionally, several extension FFI callbacks panic:

  • rvr_ext_fp2_div_* (extensions/algebra/rvr/ffi/fp2/src/lib.rs): b.invert().unwrap() panics on zero divisor
  • rvr_ext_mod_div_* (extensions/algebra/rvr/ffi/modular/src/lib.rs): same pattern
  • rvr_ext_algebra_hint_sqrt (extensions/algebra/rvr/ffi/modular/src/lib.rs): .expect()

4. NULL function pointer dereference in C IO layer (unchanged)
crates/rvr/rvr-openvm/c/openvm_io.c:32-49

All forwarding stubs (openvm_hint_input, openvm_print_str, etc.) dereference g_host.<callback> without null guards. If called before register_openvm_callbacks(), this is a null function pointer dereference.

5. riscv.yml missing clang-22/lld installation (unchanged)
.github/workflows/riscv.yml

The rvr matrix entry (line 37-41) doesn't install clang-22 + lld. Compare with extension-tests.yml and guest-lib-tests.yml which correctly install both via LLVM's llvm.sh 22 script. Since the RVR backend invokes clang-22 during compilation (rvr-openvm/src/toolchain.rs), the rvr test in riscv.yml will fail at runtime.

Fix this →


Medium Severity

6. Global mutable state prevents concurrent RVR instances (unchanged)
crates/rvr/rvr-openvm/c/openvm_io.c:24, extensions/deferral/rvr/c/rvr_ext_deferral.c:21

g_host is process-global with no synchronization. Parallel tests or parallel proving will stomp callbacks. Consider _Thread_local / __thread storage, or passing callbacks through RvState.

7. create_final_segment subtraction relies on uninstrumented invariant
crates/vm/src/arch/execution_mode/metered/segment_ctx.rs:532

self.instret += self.segment_check_insns - self.instrets_until_check;

The subtraction is safe only if instrets_until_check <= segment_check_insns, which holds because the C tracer's trace_block (openvm_tracer_metered.h:216-221) only fires the callback when check_counter < block_insn_count, and then adds segment_check_insns before subtracting block_insn_count. Since old_counter < block_insn_count, the result is always < segment_check_insns. However, this invariant is not asserted anywhere and depends on the C code's exact sequencing. A debug_assert!(self.instrets_until_check <= self.segment_check_insns) before this line would catch violations if the C tracer logic changes.

8. u32s_as_u64s alignment check is debug-only (unchanged)
crates/rvr/rvr-openvm-ffi-common/src/lib.rs:167-176

debug_assert! on alignment — stripped in release builds. Misaligned input produces UB via from_raw_parts.

9. TracerPtr<T> null check is debug-only (unchanged)
crates/vm/src/arch/rvr/state.rs:51-69

deref() / deref_mut() use debug_assert!(!self.0.is_null()). In release, dereferencing null is instant UB. The Rust code in execute.rs always sets the pointer before use, but the safety contract is fragile.

10. UnknownComplexField arithmetic hardcodes non-residue = -1 (unchanged)
extensions/algebra/rvr/ffi/fp2/src/lib.rs:141-155

Both mul and div assume Fp2 = Fp[i]/(i²+1). The norm b₀²+b₁² is only correct when the non-residue is -1. This holds for BN254/BLS12-381 but will silently produce wrong results for other curves.

11. C tracer trace_block counter underflow risk
crates/rvr/rvr-openvm/c/openvm_tracer_metered.h:215-222

After on_check adds segment_check_insns to check_counter, the unconditional check_counter -= block_insn_count could wrap if a single block has more instructions than segment_check_insns + old_remaining. The max_block_insns cap (1000) and matching default segment_check_insns (1000) make this unlikely, but the invariant isn't enforced with a static assert.

12. __builtin_assume_aligned on potentially unaligned guest memory (unchanged)
crates/rvr/rvr-openvm/c/openvm_state.h:95-113

rd_mem_u32, wr_mem_u32 etc. use __builtin_assume_aligned which is a compiler promise, not a runtime check. If a guest does misaligned access, this is UB.

13. Fragile chip index +1 offset assumptions (unchanged)
extensions/deferral/rvr/src/lib.rs:139-143, extensions/keccak256/rvr/src/lib.rs:110-114, extensions/sha2/rvr/src/lib.rs:131-135

Multiple extensions derive secondary chip indices as primary_chip_idx + 1, relying on circuit extension AIR allocation order.


Low Severity / Code Quality

14. Redundant HashMap<u32, u32> maps PC to itself (unchanged)
crates/rvr/rvr-openvm/src/emit/project.rs:639-640 — should be a HashSet.

15. execution_status() maps unknown values to Running (unchanged)
crates/rvr/rvr-state/src/state.rs:123-131 — silently hides memory corruption.

16. Duplicate decode_reg function (unchanged)
crates/rvr/rvr-openvm-lift/src/helpers.rs:5-7 and opcode.rs:223-225.

17. RvrCompiled field drop order sensitivity (unchanged)
crates/vm/src/arch/rvr/compile.rs:21-26lib must drop before temp_dir. Current declaration order is correct but fragile. Consider a comment.

18. No tco + rvr mutual exclusion guard
crates/vm/src/lib.rstco+aot and rvr+aot have compile_error! guards, but tco+rvr does not.


Changes in Latest Commit (43e0f72) — Detailed Review

The refactor is clean and correct:

  • RvrMeteredInstance no longer stores inventory or executor_idx_to_air_idx; stores compiled: RvrCompiled instead. The Executor type parameter is dropped — now simply RvrMeteredInstance<F>.
  • RvrMeteredCostInstance same pattern, plus now stores widths: Vec<u64> so the cost computation uses the correct AIR widths baked at construction time.
  • metered_cost_instance signature correctly changed to require widths: &[usize], and get_metered_cost_rvr_instance properly extracts widths from self.pk.per_air.
  • The type aliases RvrMeteredInstance2 and RvrMeteredCostInstance2 are removed (no longer needed without the VC type parameter).
  • execute_metered and execute_metered_cost no longer compile — they use the pre-compiled self.compiled. This directly addresses the "recompilation on every invocation" issue from prior reviews.
  • One minor note: rvr_instance (pure) already compiled at construction in the prior commit, so this commit makes the metered paths consistent.

Positive Patterns

  • ABI compile-time checks (abi_consts.rs): const assertions ensuring Rust/C layout consistency
  • Feature flag chain correctly wired: cargo-openvm/rvropenvm-sdk/rvr → per-extension circuit crates
  • Clean instance ownership: After this refactor, all three RVR instance types (Pure, Metered, MeteredCost) own their RvrCompiled — no lazy compilation, no type parameter leaking
  • Guard page protection (GuardedMemory): PROT_NONE page at end of memory for basic overflow detection
  • Source attribution: Generated C code includes #line directives mapping back to guest source files
  • SegmentationCtx reuse: Well-factored shared infrastructure between interpreter and RVR paths

Summary

The latest refactor (43e0f72) is a solid improvement — moving compilation to instance construction eliminates per-execution overhead and simplifies types. The overall RVR architecture (adapter/core split, extension trait pattern, C codegen pipeline) is well-designed.

Priority fixes (unchanged from prior reviews):

  1. Add bounds checking at internal JumpDyn/Branch dispatch sites — guest-controlled register values can index out of bounds
  2. Add memory bounds checks in IO callbacks before pointer arithmetic
  3. Fix or replace panicking code in extern "C" callbacks
  4. Add clang-22 install step in riscv.yml
  5. Add debug_assert! guard on the create_final_segment subtraction invariant

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Claude finished @gdmlcjs's task in 6m 55s —— View job


Code Review: PR #2790 — RVR Integration (Re-review on sync)

Reviewed ~15.4k additions across 176 files. The latest commit (36c26d5) simplifies the benchmark code by introducing a BenchExecutor trait that eliminates duplicate setup across Pure/Metered/MeteredCost execution modes — a clean improvement. Combined with 43e0f72 (compile-at-construction) and 0ea17b6 (SegmentationCtx reuse), the recent commits significantly improve code organization.

  • Gather context — diff against base, understand new commits since last review
  • Review core VM integration (crates/vm/src/arch/rvr/)
  • Review RVR crates (crates/rvr/)
  • Review extension RVR support and C FFI code
  • Review CI/build changes
  • Check status of previously identified issues
  • Post comprehensive review

Changes in Latest Commits

36c26d5 — Simplify benchmarks: Introduces BenchExecutor trait with PureExecution, MeteredExecution, MeteredCostExecution implementors. OnceLock-backed caching and cfg-gated Instance type aliases replace the previous per-mode copy-paste. Well-structured — each mode cleanly implements build_instance + run_execution. One minor note: MeteredCostExecution::build_instance discards the MeteredCostCtx from metered_cost_setup() (line 423: let (_ctx, executor_idx_to_air_idx) = ...) then re-accesses it via metered_cost_setup().0.clone() in run_execution (line 436). This is correct (OnceLock ensures same value) but slightly surprising.

43e0f72 — Compile at construction: RvrMeteredInstance and RvrMeteredCostInstance now own pre-compiled RvrCompiled directly, eliminating per-execution recompilation and dropping the Executor type parameter. Clean simplification.

0ea17b6 — SegmentationCtx reuse: Replaces duplicated MeteredConfig, RvrSegment, BitSet with shared SegmentationCtx methods. ~300 lines removed. Well-factored.


Critical / High Severity

1. Out-of-bounds dispatch table access for dynamic jumps (unchanged from prior reviews)
crates/rvr/rvr-openvm/src/emit/codegen.rs:142-145, project.rs:628-636

rv_execute's entry bounds-checks state->pc against [text_start, max_pc] (project.rs:662), but JumpDyn terminators (codegen.rs:144) and unresolved branch targets (codegen.rs:160) call dispatch_table[dispatch_index(next_pc)] without bounds checking. dispatch_index computes (pc - text_start) >> 2 — if next_pc is below text_start, the unsigned subtraction wraps to a huge index. Targets above max_pc also read past the table end.

Additionally, max_pc is computed from blocks.iter().map(|b| b.start_pc) only (project.rs:629-634). Branch targets, fall-through PCs (terminator_pc + 4), and resolved jump destinations beyond the highest block start PC are not included in the calculation, which could make the table undersized even for statically-known targets.

Impact: Guest-controlled register values (JumpDyn uses rs1) can index a garbage function pointer and call it — process-level UB.

Fix this →

2. Missing memory bounds checks in IO callbacks (unchanged)
crates/vm/src/arch/rvr/io.rs:66-74, 89-101, 105-119

Guest programs can trigger out-of-bounds host memory access:

  • host_print_str (line 70): io.memory_ptr.add(ptr as usize) — no ptr + len <= MEM_SIZE check
  • host_hint_storew (line 99): io.memory_ptr.add(dest_addr as usize) — no dest_addr + 4 <= MEM_SIZE check
  • host_hint_buffer (line 115): io.memory_ptr.add(dest_addr as usize) — no dest_addr + nbytes <= MEM_SIZE check

The 16 KB guard page catches small overflows, but overflows beyond 16 KB bypass it entirely. Consider storing MEM_SIZE in OpenVmIoState and validating before each unsafe access.

Fix this →

3. Panics in extern "C" callbacks — UB or harsh abort (unchanged, scope widened)
crates/vm/src/arch/rvr/io.rs:133-137

host_reveal uses assert!() inside an extern "C" function. Additionally, host_reveal line 131 computes ptr as usize + offset as usize — if both are near u32::MAX, this overflows in debug (panic across FFI = UB) or wraps in release (passing the bounds check with a small start).

Other extern "C" callbacks that panic:

  • rvr_ext_fp2_div_* (extensions/algebra/rvr/ffi/fp2/src/lib.rs): b.invert().unwrap() on zero divisor
  • rvr_ext_mod_div_* (extensions/algebra/rvr/ffi/modular/src/lib.rs): same
  • rvr_ext_algebra_hint_sqrt (extensions/algebra/rvr/ffi/modular/src/lib.rs): .expect()

On Rust 1.91+ these abort rather than cause UB, but aborting the entire host on bad guest input is a harsh failure mode.

Fix this →

4. NULL function pointer dereference in C IO layer (unchanged)
crates/rvr/rvr-openvm/c/openvm_io.c:32-49

All forwarding stubs (openvm_hint_input, openvm_print_str, etc.) dereference g_host.<callback> without null guards. g_host is zero-initialized — if any stub is called before register_openvm_callbacks(), it dereferences a null function pointer.

5. riscv.yml missing clang-22/lld installation (unchanged)
.github/workflows/riscv.yml

The rvr matrix entry (lines 37-41) doesn't install clang-22 + lld. Compare with extension-tests.yml and guest-lib-tests.yml which both install via llvm.sh 22. The RVR codegen invokes clang-22 at runtime, so rvr tests will fail with "command not found".

Fix this →


Medium Severity

6. create_final_segment subtraction invariant is unguarded
crates/vm/src/arch/execution_mode/metered/segment_ctx.rs:532

self.instret += self.segment_check_insns - self.instrets_until_check;

This subtraction relies on instrets_until_check <= segment_check_insns, which holds due to the C tracer's trace_block sequencing — but the invariant is not asserted anywhere and depends on the C code's exact logic. Since create_final_segment is now shared infrastructure (used by both RVR and interpreter paths), a debug_assert!(self.instrets_until_check <= self.segment_check_insns) before this line would catch violations if the C tracer logic ever changes.

7. Global mutable state prevents concurrent RVR instances (unchanged)
crates/rvr/rvr-openvm/c/openvm_io.c:24, extensions/deferral/rvr/c/rvr_ext_deferral.c:21

g_host is process-global with no synchronization. Parallel tests or parallel proving will stomp callbacks. Consider _Thread_local / __thread storage.

8. TracerPtr<T> null check is debug-only
crates/vm/src/arch/rvr/state.rs:51-69

deref() / deref_mut() use debug_assert!(!self.0.is_null()). In release, dereferencing null is instant UB. The Rust code always sets the pointer before use, but the safety contract is fragile.

9. u32s_as_u64s alignment check is debug-only
crates/rvr/rvr-openvm-ffi-common/src/lib.rs:167-176

The alignment assertion is debug_assert!, stripped in release. Misaligned input produces UB via from_raw_parts. Since the function is pub, promote to assert! or use align_to.

10. UnknownComplexField arithmetic hardcodes non-residue = -1 (unchanged)
extensions/algebra/rvr/ffi/fp2/src/lib.rs:141-155

Both mul and div assume Fp2 = Fp[i]/(i²+1). The norm b₀²+b₁² is only correct when the non-residue is -1. Correct for BN254/BLS12-381 but will silently produce wrong results for other extensions. At minimum, document this assumption.

11. Missing tco + rvr mutual exclusion guard
crates/vm/src/lib.rs

tco+aot and rvr+aot have compile_error! guards (lines 7-12), but tco+rvr does not. These features are likely incompatible — tco changes the executor dispatch loop fundamentally.

12. __builtin_assume_aligned on potentially unaligned guest memory (unchanged)
crates/rvr/rvr-openvm/c/openvm_state.h:95-113

rd_mem_u32, wr_mem_u32 etc. use __builtin_assume_aligned. This is a compiler promise, not a check — if a guest does misaligned access, this is UB. On x86-64 this works in practice; on aarch64 with strict alignment it could SIGBUS. Document the alignment invariant or use memcpy without the hint.

13. C tracer trace_block counter underflow risk (unchanged)
crates/rvr/rvr-openvm/c/openvm_tracer_metered.h:215-222

After on_check fires (line 218), check_counter is reset. The unconditional check_counter -= block_insn_count (line 221) could wrap to a huge u32 if the block exceeds the post-callback value. The max_block_insns cap makes this unlikely, but the invariant on_check != NULL in metered mode is not asserted — if on_check is NULL, the subtraction wraps and effectively disables all future periodic checks.

14. u64 to u32 truncation of segment_check_insns
crates/vm/src/arch/rvr/metered.rs (multiple sites)

segment_check_insns is u64 but cast to u32 with as u32. Default (1000) is safe, but user-configurable values above u32::MAX silently truncate. Use try_into().unwrap() or add a debug assertion.


Low Severity / Code Quality

15. Redundant HashMap<u32, u32> maps PC to itself (unchanged)
crates/rvr/rvr-openvm/src/emit/project.rs:639-640 — should be a HashSet.

16. execution_status() maps unknown values to Running (unchanged)
crates/rvr/rvr-state/src/state.rs:123-131 — silently hides corruption. Return Trapped for unknown values as a fail-safe.

17. Duplicate decode_reg function (unchanged)
crates/rvr/rvr-openvm-lift/src/helpers.rs:5-7 and opcode.rs:223-225.

18. RvrCompiled field drop order sensitivity (unchanged)
crates/vm/src/arch/rvr/compile.rs:21-26lib must drop before temp_dir. Current order is correct but fragile. Add a comment.

19. poseidon2_idx computed as trace_heights.len() - 2
crates/vm/src/arch/rvr/metered.rs:257 — positional assumption that Poseidon2 AIR is second-to-last. Fragile if AIR ordering changes. Consider passing the index explicitly.

20. mod_inverse returns 0 for zero inputs in release
extensions/algebra/rvr/ffi/common/src/lib.rs:170-183debug_assert! on a == 0. In release, dividing by zero silently produces 0. The circuit-side would catch this, but pure RVR execution would not.


Positive Patterns

  • Benchmark refactoring (36c26d5): BenchExecutor trait cleanly abstracts over execution modes with cfg-gated type aliases — good use of Rust's type system
  • ABI compile-time checks (abi_consts.rs): const assertions ensuring Rust/C layout consistency
  • Instance ownership model: All three RVR instance types own their RvrCompiled — no lazy compilation, no type parameter leaking
  • SegmentationCtx reuse: Well-factored shared infrastructure between interpreter and RVR paths
  • Feature flag chain correctly wired: cargo-openvm/rvropenvm-sdk/rvr → per-extension circuit crates
  • Guard page protection (GuardedMemory): PROT_NONE page at end of memory for basic overflow detection
  • Source attribution: Generated C code includes #line directives mapping back to guest source files
  • Cargo.lock is clean: All new packages are workspace-internal path dependencies

Summary

The latest three commits (SegmentationCtx reuse → compile-at-construction → benchmark simplification) are good improvements that eliminate duplication and simplify types. The BenchExecutor trait pattern is particularly clean.

Priority fixes (unchanged from prior reviews):

  1. Add bounds checking at internal JumpDyn/Branch dispatch sites — guest-controlled register values can index out of bounds in the dispatch table
  2. Add memory bounds checks in IO callbacks before pointer arithmetic
  3. Fix panicking code in extern "C" callbacks — use abort-safe error handling
  4. Add clang-22 install step in riscv.yml — CI will fail without it
  5. Add debug_assert! guard on the create_final_segment subtraction invariant
  6. Add tco + rvr mutual exclusion guard in crates/vm/src/lib.rs

@github-actions

This comment has been minimized.

Air indices are now represented as an enum in rvr code. The `AirIndex`
enum has `Uninitialized` and `NoChip` variants that replace the previous
`NO_CHIP = u32::MAX`. `AirIndex::Uninitialized` is only used in pure
execution where air indices don't matter and causes a panic in rvr
metered and metered cost execution.

closes INT-7611
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

Code review

Checked for bugs and CLAUDE.md compliance against AGENTS.md. Two memory-safety issues stand out in the new RVR backend.

1. Out-of-bounds dispatch-table read on dynamic jumps (memory safety / UB)

JumpDyn (jalr) tail-calls into dispatch_table using a guest-controlled index without any bounds check, in:

};
ctx.write_line(&format!("state->pc = {next_pc};"));
ctx.write_line(&format!(
"[[clang::musttail]] return dispatch_table[dispatch_index({next_pc})]({args});"
));
}

dispatch_index(pc) is (pc - text_start) >> 2:

writeln!(
h,
"static __attribute__((always_inline)) inline uint32_t dispatch_index(uint32_t pc) {{"
)
.unwrap();
writeln!(h, " return (pc - 0x{text_start:08x}u) >> 2;").unwrap();
writeln!(h, "}}").unwrap();
writeln!(h).unwrap();

…and the table is sized to (max_pc - text_start)/4 + 1:

// Dense dispatch table: one entry per 4-byte instruction slot.
let max_pc = blocks
.iter()
.map(|b| b.start_pc)
.chain(std::iter::once(entry_point))
.max()
.unwrap_or(text_start);
debug_assert!(max_pc >= text_start);
let table_size = ((max_pc - text_start) / 4 + 1) as usize;
// Build block-start lookup.
let block_starts: std::collections::HashMap<u32, u32> =
blocks.iter().map(|b| (b.start_pc, b.start_pc)).collect();
writeln!(src, "BlockFn dispatch_table[{table_size}] = {{").unwrap();
for i in 0..table_size {
let pc = text_start + (i as u32) * 4;
if block_starts.contains_key(&pc) {
writeln!(src, " block_0x{pc:08x},").unwrap();
} else {
writeln!(src, " rv_trap,").unwrap();
}
}
writeln!(src, "}};").unwrap();
writeln!(src).unwrap();

The only PC range guard runs once in the `rv_execute` prologue:

.unwrap();
writeln!(
src,
" if (state->pc < 0x{text_start:08x}u || state->pc > 0x{max_pc:08x}u) {{"
)
.unwrap();
writeln!(src, " rv_trap({args_from_state});").unwrap();
writeln!(src, " return;").unwrap();
writeln!(src, " }}").unwrap();
writeln!(src, " uint32_t idx = dispatch_index(state->pc);").unwrap();
writeln!(src, " dispatch_table[idx]({args_from_state});").unwrap();
writeln!(src, " return;").unwrap();
writeln!(src, "}}").unwrap();

All subsequent inter-block transitions are `[[clang::musttail]] return dispatch_tabledispatch_index(next_pc);` and bypass that guard. For `JumpDyn`, `next_pc = (rs1 + imm) & ~1` is fully guest-controlled; if the guest computes a target `pc < text_start` the unsigned subtraction in `dispatch_index` wraps to a huge `uint32_t`, and if `pc > max_pc` the index exceeds `table_size`. The result is a load of an arbitrary 8-byte value adjacent to `dispatch_table` and a `musttail` call through it — UB at minimum, and a potential RCE vector when the input ELF is adversarial (the stated use case for OpenVM).

In-range misaligned/non-block-start jumps land on `rv_trap` slots, so only the out-of-range case is unsafe — the `Branch`/`FallThrough`/`emit_tail_call` sites use compile-time constant targets and are safe in practice (the dispatch-table fallback there is dead code unless the lifter misclassifies a block, which would also be a lift bug).

Suggested fix: add a bounded fallback in `dispatch_index` (or guard each indexed tail-call) so that `pc < text_start || pc > max_pc` resolves to `rv_trap` instead of indexing the table. E.g. inline a range check before the shift, or emit the check in the `JumpDyn` codegen path specifically (static targets don't need it).

2. Heap buffer overflow when `segment_check_insns` is raised above the default

The C tracer's `record_mem_page*` / `record_pv_page*` / `record_deferral_page*` deliberately skip bounds checks:

/* No bounds check — see MEM_PAGE_BUF_CAP in metered.rs. */
static __attribute__((always_inline)) inline void record_mem_page(Tracer* t, uint32_t page) {
if (likely(page == t->last_mem_page)) {
return;
}
t->last_mem_page = page;
t->mem_page_buf[t->mem_page_buf_len++] = page;
}
static __attribute__((always_inline)) inline void record_mem_page_range(Tracer* t, uint32_t first_page, uint32_t last_page) {
if (likely(first_page == t->last_mem_page)) {
if (first_page == last_page) {
return;
}
first_page++;
}
uint32_t n = last_page - first_page + 1;
uint32_t len = t->mem_page_buf_len;
for (uint32_t i = 0; i < n; i++) {
t->mem_page_buf[len + i] = first_page + i;
}
t->mem_page_buf_len = len + n;
t->last_mem_page = last_page;
}
/* No bounds check — see PV_PAGE_BUF_CAP in metered.rs. */
static __attribute__((always_inline)) inline void record_pv_page(Tracer* t, uint32_t page) {
t->pv_page_buf[t->pv_page_buf_len++] = page;
}
static __attribute__((always_inline)) inline void record_pv_page_range(Tracer* t, uint32_t first_page, uint32_t last_page) {
uint32_t n = last_page - first_page + 1;
uint32_t len = t->pv_page_buf_len;
for (uint32_t i = 0; i < n; i++) {
t->pv_page_buf[len + i] = first_page + i;
}
t->pv_page_buf_len = len + n;
}
/* No bounds check — see DEFERRAL_PAGE_BUF_CAP in metered.rs. */
static __attribute__((always_inline)) inline void record_deferral_page(Tracer* t, uint32_t page) {
t->deferral_page_buf[t->deferral_page_buf_len++] = page;
}
static __attribute__((always_inline)) inline void record_deferral_page_range(Tracer* t, uint32_t first_page, uint32_t last_page) {
uint32_t n = last_page - first_page + 1;
uint32_t len = t->deferral_page_buf_len;
for (uint32_t i = 0; i < n; i++) {
t->deferral_page_buf[len + i] = first_page + i;
}
t->deferral_page_buf_len = len + n;
}

…relying on the `_Static_assert` pair at the top of the header to prove the heap-allocated buffers are large enough between flushes:

static constexpr uint32_t MAX_MEM_PAGES_PER_INSN = 10;
static constexpr uint32_t MAX_PV_PAGES_PER_INSN = 1;
_Static_assert(
TRACER_MEM_PAGE_BUF_CAP >= 2 * TRACER_SEGMENT_CHECK_INSNS * MAX_MEM_PAGES_PER_INSN,
"MEM_PAGE_BUF_CAP too small for worst-case pages per flush interval");
_Static_assert(
TRACER_PV_PAGE_BUF_CAP >= 2 * TRACER_SEGMENT_CHECK_INSNS * MAX_PV_PAGES_PER_INSN,
"PV_PAGE_BUF_CAP too small for worst-case pages per flush interval");
/* No static assert for DEFERRAL — justifying the capacity is a TODO
* (see DEFERRAL_PAGE_BUF_CAP in metered.rs). */

But `TRACER_SEGMENT_CHECK_INSNS` is baked in from `DEFAULT_SEGMENT_CHECK_INSNS = 1000` at codegen time, while the runtime check threshold is taken from `MeteredCtx::segment_check_insns`:

state.tracer.mem_page_buf_len = 0;
state.tracer.pv_page_buf_len = 0;
state.tracer.deferral_page_buf_len = 0;
state.tracer.last_mem_page = NO_LAST_PAGE;
state.tracer.check_counter = seg_state.segmentation_ctx.segment_check_insns as u32;
state.tracer.on_check = Some(metered_periodic_check);
state.tracer.seg_state = &mut seg_state as *mut SegmentationState as *mut c_void;

…which has a public setter that accepts any `u64` and resizes only the unrelated `page_indices_since_checkpoint` buffer:

pub fn with_segment_check_insns(mut self, segment_check_insns: u64) -> Self {
self.segmentation_ctx.segment_check_insns = segment_check_insns;
self.segmentation_ctx.instrets_until_check = segment_check_insns;
// Update memory context with new segment check instructions
let page_indices_since_checkpoint_cap =
MemoryCtx::<PAGE_BITS>::calculate_checkpoint_capacity(segment_check_insns);
self.memory_ctx.page_indices_since_checkpoint =
vec![0; page_indices_since_checkpoint_cap].into_boxed_slice();
self.memory_ctx.page_indices_since_checkpoint_len = 0;
self
}

The C buffers stay sized to `MEM_PAGE_BUF_CAP` / `PV_PAGE_BUF_CAP` (compile-time constants used directly for the Rust `vec![0u32; ...]` allocations):

Self {
segmentation_ctx,
trace_heights,
is_trace_height_constant,
mem_page_buf: vec![0u32; MEM_PAGE_BUF_CAP],
pv_page_buf: vec![0u32; PV_PAGE_BUF_CAP],
deferral_page_buf: vec![0u32; DEFERRAL_PAGE_BUF_CAP],
page_indices: BitSet::new(bitset_size),
addr_space_access_count: vec![0u32; num_addr_spaces],
address_height,
addr_space_height,
chunk_bits,
}
}

With `MEM_PAGE_BUF_CAP = 65536` and `MAX_MEM_PAGES_PER_INSN = 10`, the assert holds only for `segment_check_insns <= 65536/(2*10) = 3276`; for `PV_PAGE_BUF_CAP = 4096` and `MAX_PV_PAGES_PER_INSN = 1`, the safe ceiling is `4096/2 = 2048`. Any caller that does `MeteredCtx::with_segment_check_insns(n)` with `n > 2048` and runs through the RVR backend can overflow the heap buffer at the unchecked `mem_page_buf[mem_page_buf_len++]` / `pv_page_buf[...]` stores. `MAX_MEM_PAGES_PER_INSN = 10` is a worst case (e.g. HINT_BUFFER / ECC), so typical programs won't hit it, but chosen-input programs can.

Suggested fix: assert in `execute.rs` (and the other RVR metered entry points) that `seg_state.segmentation_ctx.segment_check_insns <= DEFAULT_SEGMENT_CHECK_INSNS`, or rebuild/regenerate the C with the runtime value baked in. At minimum a `debug_assert!` with a clear error message is better than silent UB.


No CLAUDE.md violations: the file-naming rule under "File Structure" in AGENTS.md is chip-scoped (the surrounding sentences are about `air.rs`/`trace.rs`), and pre-existing `crates/vm/src/arch/aot/` already uses names like `pure.rs` / `metered_execute.rs`, so the new `crates/vm/src/arch/rvr/{execute,pure,metered,metered_cost}.rs` layout matches established practice.

No issues found in the workflow/CI changes or the broader diff-only scan.

…#2807)

## Summary

- **RVR metered execution can now suspend at segment boundaries.**
Previously only the interpreter and AOT backends supported
segment-by-segment metered runs; RVR ran metered execution straight to
termination. This branch adds a parallel `RvrMeteredSegmentInstance`
(`RvrMeteredInstanceWith<F, SegmentBoundary>`) whose
`execute_metered_until_segment_boundary` returns after the metered
segmentation callback creates a segment, mirroring the suspend/resume
shape the other two backends already expose. The tracer countdown is
carried across calls by checkpointing `tracer.check_counter` into
`segmentation_ctx.instrets_until_check` on suspend and restoring it on
entry; both values are `try_from`-validated against u32 at the entry
point (new `ExecuteError::InvalidMeteredContext`) and the hot
C-callback's matching cast is guarded by `debug_assert_eq!`. Mid-segment
suspension is out of scope: `initialize_segment_memory` resets the
per-segment page-indices checkpoint buffer assuming the page buffers
have already been flushed at a segment boundary.

- **Generated-C surface reorganized by policy.** Block-begin and
suspender helpers move into
`c/block/{instret,metered,metered_segment}.h` and
`c/suspender/{none,instret_limit,segment_boundary}.h`; tracer headers
move under `c/tracer/`. A new `SuspendPolicy` enum drives which pair is
included, with `compile_impl` rejecting incoherent combinations
(`Metered` × `InstretLimit`, `Pure|MeteredCost` × `SegmentBoundary`) at
compile time. Compile-time selection without preprocessor directives in
the generated C, per the AGENTS.md guidance.

- **`MeteredCtx` round-trip via `MeteredCtxParts`.** `SegmentationState`
now carries the full `MemoryCtx` and `suspend_on_segment` flag, so a
suspended metered run can be converted back to a `MeteredCtx`
(`into_metered_ctx`) and resumed without losing page-tracking or
segmentation state. A new test exercises the field-by-field round-trip.

- **All RVR codegen inputs embedded at compile time.** Removes every
`CARGO_MANIFEST_DIR` runtime dependency from the RVR project-emit
pipeline so binaries (Docker images, etc.) no longer need the source
tree to invoke `compile_impl`. Core C files (`openvm_io.{c,h}`,
`rvr_ext_wrappers.c`) switch from `fs::copy` to
`fs::write(include_str!(…))`. Extension `.a` staticlibs migrate from
`staticlib_path() -> &Path` to `staticlib_file() -> (&'static str,
&'static [u8])` via `include_bytes!(env!("RVR_*_FFI_STATICLIB"))`, with
a new `write_extension_staticlibs` helper writing them to the temp
project for `make` to link. Modular's libsecp256k1 amalgamation include
(~85 `.c`/`.h` files, with test/bench/ctime/valgrind files filtered out)
is collected by `extensions/algebra/rvr/build.rs` into a generated
`SECP256K1_C_FILES` const and returned via the new
`RvrExtension::extra_c_include_files()` hook (for files written but not
compiled as their own TUs); `extra_cflags` switches to relative
`-Isecp256k1/src` / `-Isecp256k1` against the temp project root. Trait
return types are tightened from `&str` to `&'static str`.

- **Up-front toolchain detection.** `compile_impl` probes the C
compiler, linker, and `make` in PATH before building and reports all
missing tools at once via `RuntimeToolchainError`. Adds `RVR_MAKE`
override, forwards `HOST_OS` to the Makefile (replacing its `uname -s`
shell-out), and threads path context into `CompileError` I/O variants.

- **Metrics consolidation.** The four near-identical `Instant::now() …
counter!().absolute() … gauge!().set()` blocks across interpreter / AOT
/ RVR are replaced by a single `ExecutionMetricTimer` helper in
`arch::execution_metrics` (guarded against div-by-zero on
sub-microsecond runs). A complementary `CompilationTimer`
(`arch::compilation_metrics`) wraps every `*_instance` constructor and
emits a `compile_{pure,metered,metered_cost,metered_segment}_ms` gauge
labeled by backend (`interpreter` / `aot` / `rvr`).

- **E1/E2/E3 jargon dropped.** `execute_e1` span/metric names become
`execute_pure`; `terminate_execute_e12_*` → `terminate_execute_*`; const
generic `E1` → `PURE_EXECUTION`. Comment references to "(E1)/(E2)/(E3)"
are removed in favor of "pure/metered/preflight".

## Breaking changes

- **Metric names.** `execute_e1_insns` → `execute_pure_insns`,
`execute_e1_insn_mi/s` → `execute_pure_insn_mi/s`. Dashboards or
alerting keyed on the old names need to be updated.
- **`RvrExtension` trait surface.** `extra_c_source_paths() ->
Vec<PathBuf>` → `extra_c_sources() -> Vec<(&'static str, &'static
str)>`; `staticlib_path/paths()` → `staticlib_file/files()` returning
embedded bytes; new optional `extra_c_include_files()` for files written
but not compiled as TUs. Existing impls need a one-time conversion to
`include_str!` / `include_bytes!`.
- **`ExecutorInventory` generic param renames** (`E1`/`E2`/`E3` →
`CombinedE`/`NewE`/`TargetE`) are visible in error messages but
compatible.
- **`CompileError` shape.** `CProject(io::Error)` → `CProject { path,
source }`; `Toolchain(String)` → `Toolchain(#[from]
RuntimeToolchainError)`; new `ToolchainCommand { command, source }`.
Callers matching on these variants need to update.
- **Binary size.** Embedded `.a` staticlibs and libsecp256k1 sources
grow the binary by roughly 5–30 MB depending on enabled extensions.

resolves int-7917

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

This comment has been minimized.

shuklaayush and others added 2 commits May 22, 2026 17:30
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Code review

Checked for bugs and CLAUDE.md compliance against AGENTS.md. One new memory-safety issue in the keccak RVR extension; the two issues from the prior sync (dispatch-table OOB on JumpDyn, heap overflow when segment_check_insns is raised above the compile-time default) still apply as previously written.

Guest-controlled stack buffer overflow in rvr_ext_xorin

rvr_ext_xorin allocates two fixed 200-byte (KECCAK_WIDTH_WORDS = 50 u32s) stack buffers and uses assume(num_words <= KECCAK_WIDTH_WORDS) — a __builtin_assume compile-time hint, not a runtime check — before two range reads that each memcpy num_words * 4 bytes into them:

https://github.com/openvm-org/openvm/blob/8d74cbff035ee9f7f0a9b0a8e5d4d0f1e91f44f4/extensions/keccak256/rvr/c/rvr_ext_keccak.c#L35-L56

num_words = (len + WORD_SIZE - 1) / WORD_SIZE, and len is taken straight from a guest register at the call site without any clamp:

fn emit_c(&self, ctx: &mut dyn ExtEmitCtx) {
let buf_ptr = ctx.read_reg(self.buffer_ptr_reg);
let input = ctx.read_reg(self.input_ptr_reg);
let len = ctx.read_reg(self.len_reg);
let chip = air_index_to_c(self.chip_idx);
ctx.write_line(&format!(
"rvr_ext_xorin(state, {buf_ptr}, {input}, {len}, {chip}u);"
));
}

assume() is just __builtin_assume:

/* Branch prediction hints. */
static __attribute__((always_inline)) inline int likely(int x) { return __builtin_expect(!!(x), 1); }
static __attribute__((always_inline)) inline int unlikely(int x) { return __builtin_expect(!!(x), 0); }
static __attribute__((always_inline)) inline void assume(int x) { __builtin_assume(x); }
static __attribute__((always_inline)) inline void debug_assume(int x) {
assert(x);

…and rd_mem_u32_range_traced lowers to a plain memcpy(out, p, num_words * WORD_SIZE):

* so these lower to a single memcpy. */
static __attribute__((always_inline)) inline void rd_mem_u32_range(RvState* restrict state, uint32_t base_addr,
uint32_t* restrict out, uint32_t num_words) {
const void* p = __builtin_assume_aligned(mem_ptr(state, base_addr), WORD_SIZE);
memcpy(out, p, (size_t)num_words * sizeof(uint32_t));
}

A guest that sets len_reg = 0xFFFFFFFC produces num_words = 0x3FFFFFFF; the two range reads then attempt to memcpy ~4 GiB into the 200-byte stack arrays buffer[] / input[], corrupting the host saved registers and return address (preserve_most notwithstanding) before the page-table guard would catch the source-side OOB. The XORIN AIR constrains len ≤ 132, but the RVR runtime path never enforces that constraint. Worse, because __builtin_assume(num_words <= 50) is propagated to the optimizer as known-true, the compiler is free to remove any later defensive reasoning, making the overflow more reliable rather than less.

This is distinct from the host-callback bounds issues called out in the prior reviews (host_print_str / host_hint_storew / host_hint_buffer in crates/vm/src/arch/rvr/io.rs): different file, different code path, and a stack buffer overflow rather than guarded-region heap OOB — so the existing 16 KB guard page does not catch it.

Suggested fix: validate num_words (or len) against KECCAK_WIDTH_WORDS at the top of rvr_ext_xorin and rv_trap on overflow, e.g.:

if (unlikely(num_words == 0 || num_words > KECCAK_WIDTH_WORDS)) { rv_trap(state); return; }

Or push the check up to XorinInstr::emit_c so the same pattern can be applied uniformly to other length-parameterized extension callbacks. The assume() is then either redundant (drop it) or a correctness-preserving optimization hint (keep it after the check).


No additional new CLAUDE.md violations found in this sync: the file-layout rule under "File Structure" in AGENTS.md is chip-scoped (the surrounding context is about air.rs/trace.rs), and the new crates/vm/src/arch/rvr/{execute,pure,metered,metered_cost}.rs layout matches the established crates/vm/src/arch/aot/ pattern.

(Note: this update edits the previously-test placeholder in place; the original sticky review remains at #issuecomment-4521872268 and should be considered superseded.)

@github-actions
Copy link
Copy Markdown

group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 3,836 12,000,265 933
keccak 19,243 18,655,329 3,363
sha2_bench 10,036 14,793,960 1,437
regex 1,394 4,137,067 353
ecrecover 602 123,583 253
pairing 886 1,745,757 261
kitchen_sink 1,909 2,579,903 409
fibonacci_e2e 1,777 12,000,265 411
regex_e2e 816 4,137,067 167
ecrecover_e2e 506 123,583 131
pairing_e2e 626 1,745,757 131
kitchen_sink_e2e 2,033 2,579,903 398

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: b7b23b6

Benchmark Workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants