feat(v2.1-rv64): switch to u16 limbs in riscv airs by shuklaayush · Pull Request #2801 · openvm-org/openvm

shuklaayush · 2026-05-20T21:38:19Z

PR 2 of the memory-bus-u16 split. Migrates RV64 cores whose constraints do not inspect individual bytes from a byte-shaped trace layout to a natively-u16 trace layout, and introduces dedicated u16 adapter variants. Bigint chips that wrap these RV64 chips inherit the new shape.

Why

#2794 made the memory-bus interaction fixed-shape (BLOCK_FE_WIDTH = 4 u16 cells). Chips whose AIRs do not constrain byte-level structure (equality, magnitude comparison, control-flow target, hintstore copy) were still carrying 8 u8 columns per register, packing them into bus cells at the eval site, and paying for a BitwiseOperationLookup byte-pair range-check the bus no longer needs.

Reshaping those columns to u16 directly halves the affected trace width and lets the AIR send the bus message without pack_u8_block, turning the byte-pair range checks into per-limb VariableRangeChecker checks bounded by LIMB_BITS.

Subsequent PRs build on the new u16 adapter variants and constants (INT256_NUM_U16_LIMBS, BLOCK_FE_WIDTH) introduced here.

What changes

Cores migrated to u16-shaped traces

Files: extensions/riscv/circuit/src/{branch_eq,branch_lt,less_than,jalr,jal_lui,auipc,hintstore,rdwrite}/.

branch_eq, branch_lt: BranchEqualCoreAir / BranchLessThanCoreAir parameterize on LIMB_BITS. Records are [u16; NUM_LIMBS]. The AIR range-checks each limb via VariableRangeCheckerBus instead of byte-pair-checking via the bitwise-lookup bus.
less_than: same treatment, wired through the new u16 ALU adapter.
jalr, jal_lui, auipc: native u16-limb rs/rd; bitwise lookup wiring removed where it only existed for byte range checks.
hintstore: mem_ptr / rem_words / data buffer all u16-shaped; variable-range checker added.

New u16 adapter variants

Files: extensions/riscv/circuit/src/adapters/alu_u16.rs, extensions/riscv-adapters/src/{vec_heap_u16,eq_mod_u16}.rs.

Rv64BaseAluU16Adapter: reads and writes [u16; BLOCK_FE_WIDTH] natively; supports immediates as a single u16 low-limb + sign bit; emits bus cells directly with no pack_u8_block.
Rv64VecHeapU16Adapter: u16-cell heap reads/writes; byte strides are BUS_PTR_SCALE * READ_SIZE.
Rv64IsEqualModU16Adapter: u16-cell variant for the modular is_eq family (used by PR 4's algebra section).

The legacy Rv64BaseAluAdapter / Rv64VecHeapAdapter / Rv64IsEqualModAdapter stay in place for the byte-shaped chips that still need byte-level constraints (ALU / MUL / SHIFT). PR 4 owns reshaping those legacy adapters' register-pointer columns.

Shared adapter touched

File: extensions/riscv-adapters/src/vec_heap_branch.rs.

READ_SIZE is now BLOCK_FE_WIDTH-aware (counts u16 cells, not bytes). The legacy register-pointer columns stay byte-shaped here; PR 4 owns the rs_val / rd_val reshape.
branch, jalr, rdwrite adapters are wired to deliver the new u16 cores their native shape.

Bigint inheritors

Files: extensions/bigint/circuit/src/{branch_eq,branch_lt,less_than}.rs, extensions/bigint/circuit/src/lib.rs.

LessThan256 uses the new u16 vec-heap adapter. INT256_NUM_U16_LIMBS = 16, INT256_U16_LIMB_BITS = 16.
BranchEqual256, BranchLessThan256 use the BLOCK_FE_WIDTH-aware vec_heap_branch adapter with the same u16-limb shape.
BaseAlu256, Mul256, Shift256, Sll256 stay byte-shaped — their AIRs decompose into byte-level XOR or shift constraints and can't move to u16 here.

Wiring

Files: extensions/riscv/circuit/src/extension/mod.rs, extensions/bigint/circuit/src/extension/mod.rs.

PR 2 cores receive the VariableRangeChecker shared chip and no longer need the BitwiseOperationLookup chip. The extension wiring threads range_checker to the new chips; legacy byte-shaped chips continue to wire bitwise_lu.

CUDA parity

Files: extensions/riscv/circuit/cuda/include/riscv/cores/{beq,blt,less_than}.cuh, extensions/riscv/circuit/cuda/src/{beq,blt,less_than,auipc,hintstore,jal_lui,jalr}.cu, extensions/bigint/circuit/cuda/src/bigint.cu, extensions/riscv-adapters/cuda/include/riscv-adapters/vec_heap_u16.cuh, extensions/riscv/circuit/cuda/include/riscv/adapters/alu_u16.cuh.

Core kernels parameterize on LIMB_BITS and use VariableRangeChecker.add_count per limb.
.cu kernels instantiate the new shapes and drop the bitwise-lookup parameter from kernel + ABI signatures where the chip no longer needs it.

Adapter / helper cleanups

File: extensions/riscv/circuit/src/adapters/mod.rs.

INT256_NUM_U8_LIMBS (= 32) replaces INT256_NUM_LIMBS (the old name lacked the u8 cell-type tag).
New constants: RV64_LOW32_U16_LIMBS = 2, RV64_LOW32_HIGH_U16_LIMBS = 1, RV64_U16_LIMB_BITS = 16, RV64_U16_LIMB_BASE, RV64_U16_LIMB_MASK, RV64_U16_SIGN_BIT, RV64_LOW32_BITS.
New helpers: rv64_bytes_to_u32_wrapping (low 32 bits of an RV64 register without the upper-half-zero debug assert), expand_to_rv64_block (zero-pad N u16 limbs to one bus block).
timed_read / timed_write use read_bytes / write_bytes to bypass TracingMemory's BLOCK_FE_WIDTH const-assert for u8 byte-view dispatch against u16-celled storage.

Migration notes

Chips that previously wired BitwiseOperationLookupChip solely for byte-pair range checks should now wire VariableRangeChecker and call add_count(limb, LIMB_BITS) per limb.
Rv64BaseAluAdapter callers that read all-u16 operands should switch to Rv64BaseAluU16Adapter.
LessThanCoreAir<NUM_LIMBS, LIMB_BITS> and equivalent now accept NUM_LIMBS = INT256_NUM_U16_LIMBS = 16, LIMB_BITS = INT256_U16_LIMB_BITS = 16 for 256-bit chips operating on u16 limbs.

ghost · 2026-05-20T22:10:16Z

Claude encountered an error after 25m 42s —— View job

Code Review: PR #2801 — Switch to u16 columns in riscv AIRs

Gather context and read the full diff
Review new u16 adapter variants (alu_u16, vec_heap_u16, vec_heap_branch_u16)
Review migrated core AIRs (branch_eq, branch_lt, less_than, jalr, jal_lui, auipc, hintstore)
Review bigint inheritors and extension wiring
Review CUDA parity changes
Review shared adapter and helper changes
Post final review

PR 2 of the memory-bus-u16 split. Migrates RV64 cores whose constraints do not inspect individual bytes from u8-shaped (Pattern A) to u16-shaped (Pattern B), introducing dedicated u16 adapter variants and propagating the new shape to bigint 256-bit inheritors. Cores migrated: - branch_eq, branch_lt: BranchEqualCoreAir/BranchLessThanCoreAir parameterize on LIMB_BITS; records are [u16; NUM_LIMBS]; AIR range-checks each limb via VariableRangeCheckerBus instead of byte-pair-checking via bitwise lookup. - less_than: same Pattern B treatment + wired through the new u16 ALU adapter. - jalr, jal_lui, auipc: native u16-limb rs/rd, drop bitwise lookup wiring. - hintstore: mem_ptr/rem_words/data buffer all u16-shaped + add the variable-range checker. New u16 adapter variants: - Rv64BaseAluU16Adapter (riscv/circuit/src/adapters/alu_u16.rs): reads and writes [u16; BLOCK_FE_WIDTH] natively, supports immediates as a single u16-low limb + sign bit, emits bus cells directly with no pack_u8_block. - Rv64VecHeapU16Adapter (riscv-adapters/src/vec_heap_u16.rs): u16-cell heap reads/writes; byte strides are BUS_PTR_SCALE * READ_SIZE. - Rv64IsEqualModU16Adapter (riscv-adapters/src/eq_mod_u16.rs): u16-cell variant for the modular is_eq family. Adapter changes on shared files: - vec_heap_branch.rs gains a BLOCK_FE_WIDTH-aware READ_SIZE (counts u16 cells, not bytes); the legacy register-pointer columns stay byte-shaped (PR 4 owns rs_val/rd_val reshaping). - branch / jalr / rdwrite adapters wired for the new u16 cores. Bigint inheritors: - LessThan256 uses the new u16 vec-heap adapter; INT256_NUM_U16_LIMBS = 16, INT256_U16_LIMB_BITS = 16. - BranchEqual256, BranchLessThan256 use the BLOCK_FE_WIDTH-aware vec_heap_branch adapter with the same u16-limb shape. - BaseAlu256, Mul256, Shift256, Sll256 stay byte-shaped (Pattern A locked by XOR/shift byte-decomp). Wiring: PR 2 cores receive the VariableRangeChecker shared chip and no longer need the bitwise op lookup chip. extension/mod.rs and the bigint extension wire the new chips with range_checker; legacy chips still use bitwise. CUDA parity: - beq.cuh / blt.cuh / less_than.cuh cores parameterize on LIMB_BITS and use VariableRangeChecker.add_count per limb. - beq.cu / blt.cu / less_than.cu / auipc.cu / hintstore.cu / jal_lui.cu / jalr.cu + bigint.cu instantiate the new shapes and drop the bitwise-lookup parameter from kernel + ABI signatures. Explicitly out (later PRs): - SHA-2 / Keccak / deferral chips (PR 3 / PR 5). - Legacy vec_heap, vec_heap_branch, eq_mod register-pointer reshaping (PR 4). - alu_w, mul_w, loadstore adapter columns (PR 4). - Algebra modular is_eq limb migration (PR 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ghost · 2026-05-20T22:25:43Z

Claude finished @shuklaayush's task in 1m 30s —— View job

Code Review: PR #2801 — Switch to u16 columns in riscv AIRs

Gather context and read the full diff
Review new u16 adapter variants (alu_u16, vec_heap_u16, vec_heap_branch_u16)
Review migrated core AIRs (branch_eq, branch_lt, less_than, jalr, jal_lui, auipc, hintstore)
Review bigint inheritors and extension wiring
Review CUDA parity changes
Review shared adapter and helper changes
Post final review

Overall: This is a well-structured, methodical migration. The constraint logic is sound, the adapter/core separation is clean, and the wiring is consistent across CPU, GPU, and bigint inheritors. I found no soundness issues. A few minor observations below.

Constraint Soundness

All migrated cores correctly transition from byte-pair range checks (BitwiseOperationLookupBus) to per-limb range checks (VariableRangeCheckerBus). The key invariant — that each u16 cell is provably in [0, 2^LIMB_BITS) — is maintained throughout.

branch_lt / less_than: The signed comparison logic is correct with u16 limbs. The MSB sign-detection (x[NUM_LIMBS-1] >> (LIMB_BITS-1)) works for both LIMB_BITS=8 and LIMB_BITS=16. The diff_val - 1 range check correctly proves non-zero differences, and the sign_shift of 2^(LIMB_BITS-1) maps signed values into the range checker's non-negative domain.

jalr: The carry-chain decomposition for to_pc = rs1 + imm using two u16 limbs is sound. The carry = (...) * inv pattern with assert_bool(carry) correctly constrains the low-32-bit addition. The to_pc_limbs[0] range check at LIMB_BITS - 1 = 15 bits correctly accounts for the *2 factor (since to_pc_least_sig_bit is split off).

jal_lui: The carry_top = (from_pc + DEFAULT_PC_STEP - rd_low_32) * inv^2 pattern is correct — dividing by LIMB_BASE^2 extracts the 32-bit carry, and assert_bool constrains it to 0 or 1. The LUI immediate decomposition at the LUI_IMM_LOW_BITS = 4 boundary is sound.

auipc: The imm << 8 shift decomposed into imm_low_8 (8 bits) and imm_high_16 (16 bits) is clean and correct. The sign-extension constraint tying is_sign_extend to bit 31 of rd via 2 * rd[1] - is_sign_extend * LIMB_BASE is sound.

hintstore: The buffer transition constraints (decrement rem_words by 1, increment mem_ptr by 8) are correct. The range-check scaling for mem_ptr_limbs and rem_words correctly narrows bounds via multiplication before checking.

branch_eq: No range checker needed — the equality check via diff_inv_marker (multiplicative inverse proof) is cell-width-agnostic and works unchanged with u16 limbs.

New u16 Adapter Variants

Rv64BaseAluU16AdapterAir (alu_u16.rs): Clean implementation. The immediate sign-extension logic for u16 is correct — rs2_imm_sign * 0xff_0000 reconstructs the 24-bit signed encoding, and the high u16 limbs are constrained to sign * 0xFFFF. The low u16 immediate limb gets a proper range check conditional on is_valid - rs2_as.

Rv64VecHeapU16AdapterAir (vec_heap_u16.rs): Correctly reads register pointers as u16-celled blocks (2 u16 limbs for the low 32 bits), applies address-range narrowing via val[1] * limb_shift, and composes the address for heap reads. The MEMORY_BLOCK_BYTES stride for consecutive block reads is correct.

Rv64VecHeapBranchU16AdapterAir (vec_heap_branch_u16.rs): Mirrors the vec_heap_u16 pattern for read-only branch operations. No writes, execution bridge uses immediate field for branch target. Correct.

Bigint Inheritors

The type aliases in bigint/circuit/src/lib.rs are correctly parameterized:

Byte-oriented chips (BaseAlu256, Mul256, Shift256): Still use INT256_NUM_U8_LIMBS = 32 and RV64_CELL_BITS = 8 ✓
Comparison chips (LessThan256, BranchEqual256, BranchLessThan256): Switched to INT256_NUM_U16_LIMBS = 16 and RV64_U16_LIMB_BITS = 16 ✓

Extension wiring (bigint/circuit/src/extension/mod.rs) correctly threads:

BitwiseOperationLookup to byte-shaped chips only
VariableRangeChecker to u16-shaped chips
RangeTupleChecker only to Mul256

CUDA Parity

All CUDA kernels correctly mirror the Rust changes:

bitwise_lookup parameter removed from kernel signatures for all migrated chips
VariableRangeChecker.add_count(limb, LIMB_BITS) calls match the Rust-side range checks
Template parameters updated: <BLOCK_FE_WIDTH, RV64_U16_LIMB_BITS> replaces <RV64_REGISTER_NUM_LIMBS, RV64_CELL_BITS>
Record types (BranchLessThanCoreRecord<NUM_LIMBS, LIMB_BITS>) correctly use uint16_t arrays

New CUDA helpers (copy_u16_cells, pack_u32_to_u16_cells_le) in offline_checker.cuh are correct and match the Rust-side u32_to_u16_cells and F::from_u16 conversions.

Shared Adapter Changes

The timed_read_u16 / timed_write_u16 API change from cell-index to byte-pointer semantics is well-guarded by debug_assert_eq!(byte_ptr & 1, 0). The byte_ptr >> 1 conversion is correct for u16-celled address spaces. All callers have been updated.

New helpers (u32_to_u16_cells, rv64_u16_block_to_bytes, expand_to_rv64_block, rv64_bytes_to_u32_wrapping) are all correct and consistent with the u16 cell model.

Minor Observations

less_than/core.rs:306-308 — The diff_val computation for non-MSB limbs casts u32 - u32 to u16 via as u16. This is safe because record.c[diff_idx] and record.b[diff_idx] are both u16 values so the subtraction is at most 0xFFFF, but the as u16 truncation is somewhat implicit. The same pattern appears in branch_lt/core.rs:325-327. This is correct but relies on the u16 range of the inputs.
hintstore/mod.rs:587-589 — The trace filler still uses pack_u8_block_bytes to convert var.data (which is [u8; RV64_REGISTER_NUM_LIMBS]) to u16 cells for the write columns. This is correct since the execution side reads hints as bytes and the packing converts to u16 for the bus. The asymmetry between byte-based execution records and u16-based trace columns is intentional and well-handled.
Rv64Io extension wiring (extension/mod.rs:761-797) — The BitwiseOperationLookupAir creation block was removed from Rv64Io, which previously created it solely for hintstore. Since hintstore now uses only VariableRangeChecker, this is correct. If any other future chip in Rv64Io needs bitwise lookup, it would need to be re-added — but that's a future concern, not a current issue.
Constants naming — INT256_NUM_U8_LIMBS (new name for INT256_NUM_LIMBS) is a good rename that makes the cell-type explicit. The old name is fully replaced.

Summary

Clean, well-executed migration. All AIR constraints are sound, adapter/core wiring is consistent, CUDA kernels match the Rust side, and the bigint inheritors correctly distinguish between byte-shaped and u16-shaped chips. No blocking issues found.

ghost · 2026-05-20T22:29:03Z

Claude encountered an error after 0s —— View job

Code Review: PR #2801 — Switch to u16 columns in riscv AIRs

Gather context and read the full diff
Review new u16 adapter variants (alu_u16, vec_heap_u16, vec_heap_branch_u16)
Review migrated core AIRs (branch_eq, branch_lt, less_than, jalr, jal_lui, auipc, hintstore)
Review bigint inheritors and extension wiring
Review CUDA parity changes
Review shared adapter and helper changes
Post final review

ghost · 2026-05-20T22:41:35Z

group	app.proof_time_ms	app.cycles	leaf.proof_time_ms
fibonacci	1,543	4,000,051	431
keccak	13,938	14,365,133	2,396
sha2_bench	9,450	11,167,961	1,431
regex	1,437	4,090,656	354
ecrecover	472	112,210	267
pairing	600	592,827	257
kitchen_sink	2,171	1,979,971	408

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: 9effefa

Benchmark Workflow