feat(v2.1-rv64): switch to u16 limbs in riscv airs#2801
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
8dd9ea1 to
366f23e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
366f23e to
bdb3578
Compare
This comment was marked as outdated.
This comment was marked as outdated.
bdb3578 to
f6456c0
Compare
This comment was marked as outdated.
This comment was marked as outdated.
f6456c0 to
32a2cb7
Compare
This comment was marked as outdated.
This comment was marked as outdated.
32a2cb7 to
3b002bc
Compare
This comment was marked as outdated.
This comment was marked as outdated.
3b002bc to
bc21ed1
Compare
This comment was marked as outdated.
This comment was marked as outdated.
bc21ed1 to
46617d6
Compare
|
Claude encountered an error after 25m 42s —— View job Code Review: PR #2801 — Switch to u16 columns in riscv AIRs
|
This comment has been minimized.
This comment has been minimized.
PR 2 of the memory-bus-u16 split. Migrates RV64 cores whose constraints do not inspect individual bytes from u8-shaped (Pattern A) to u16-shaped (Pattern B), introducing dedicated u16 adapter variants and propagating the new shape to bigint 256-bit inheritors. Cores migrated: - branch_eq, branch_lt: BranchEqualCoreAir/BranchLessThanCoreAir parameterize on LIMB_BITS; records are [u16; NUM_LIMBS]; AIR range-checks each limb via VariableRangeCheckerBus instead of byte-pair-checking via bitwise lookup. - less_than: same Pattern B treatment + wired through the new u16 ALU adapter. - jalr, jal_lui, auipc: native u16-limb rs/rd, drop bitwise lookup wiring. - hintstore: mem_ptr/rem_words/data buffer all u16-shaped + add the variable-range checker. New u16 adapter variants: - Rv64BaseAluU16Adapter (riscv/circuit/src/adapters/alu_u16.rs): reads and writes [u16; BLOCK_FE_WIDTH] natively, supports immediates as a single u16-low limb + sign bit, emits bus cells directly with no pack_u8_block. - Rv64VecHeapU16Adapter (riscv-adapters/src/vec_heap_u16.rs): u16-cell heap reads/writes; byte strides are BUS_PTR_SCALE * READ_SIZE. - Rv64IsEqualModU16Adapter (riscv-adapters/src/eq_mod_u16.rs): u16-cell variant for the modular is_eq family. Adapter changes on shared files: - vec_heap_branch.rs gains a BLOCK_FE_WIDTH-aware READ_SIZE (counts u16 cells, not bytes); the legacy register-pointer columns stay byte-shaped (PR 4 owns rs_val/rd_val reshaping). - branch / jalr / rdwrite adapters wired for the new u16 cores. Bigint inheritors: - LessThan256 uses the new u16 vec-heap adapter; INT256_NUM_U16_LIMBS = 16, INT256_U16_LIMB_BITS = 16. - BranchEqual256, BranchLessThan256 use the BLOCK_FE_WIDTH-aware vec_heap_branch adapter with the same u16-limb shape. - BaseAlu256, Mul256, Shift256, Sll256 stay byte-shaped (Pattern A locked by XOR/shift byte-decomp). Wiring: PR 2 cores receive the VariableRangeChecker shared chip and no longer need the bitwise op lookup chip. extension/mod.rs and the bigint extension wire the new chips with range_checker; legacy chips still use bitwise. CUDA parity: - beq.cuh / blt.cuh / less_than.cuh cores parameterize on LIMB_BITS and use VariableRangeChecker.add_count per limb. - beq.cu / blt.cu / less_than.cu / auipc.cu / hintstore.cu / jal_lui.cu / jalr.cu + bigint.cu instantiate the new shapes and drop the bitwise-lookup parameter from kernel + ABI signatures. Explicitly out (later PRs): - SHA-2 / Keccak / deferral chips (PR 3 / PR 5). - Legacy vec_heap, vec_heap_branch, eq_mod register-pointer reshaping (PR 4). - alu_w, mul_w, loadstore adapter columns (PR 4). - Algebra modular is_eq limb migration (PR 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
46617d6 to
9effefa
Compare
|
Claude finished @shuklaayush's task in 1m 30s —— View job Code Review: PR #2801 — Switch to u16 columns in riscv AIRs
Overall: This is a well-structured, methodical migration. The constraint logic is sound, the adapter/core separation is clean, and the wiring is consistent across CPU, GPU, and bigint inheritors. I found no soundness issues. A few minor observations below. Constraint SoundnessAll migrated cores correctly transition from byte-pair range checks ( branch_lt / less_than: The signed comparison logic is correct with u16 limbs. The MSB sign-detection ( jalr: The carry-chain decomposition for jal_lui: The auipc: The hintstore: The buffer transition constraints (decrement branch_eq: No range checker needed — the equality check via New u16 Adapter Variants
Bigint InheritorsThe type aliases in
Extension wiring (
CUDA ParityAll CUDA kernels correctly mirror the Rust changes:
New CUDA helpers ( Shared Adapter ChangesThe New helpers ( Minor Observations
SummaryClean, well-executed migration. All AIR constraints are sound, adapter/core wiring is consistent, CUDA kernels match the Rust side, and the bigint inheritors correctly distinguish between byte-shaped and u16-shaped chips. No blocking issues found. |
|
Claude encountered an error after 0s —— View job Code Review: PR #2801 — Switch to u16 columns in riscv AIRs
|
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 9effefa |

PR 2 of the memory-bus-u16 split. Migrates RV64 cores whose constraints do not inspect individual bytes from a byte-shaped trace layout to a natively-u16 trace layout, and introduces dedicated u16 adapter variants. Bigint chips that wrap these RV64 chips inherit the new shape.
Why
#2794 made the memory-bus interaction fixed-shape (
BLOCK_FE_WIDTH = 4u16 cells). Chips whose AIRs do not constrain byte-level structure (equality, magnitude comparison, control-flow target, hintstore copy) were still carrying 8 u8 columns per register, packing them into bus cells at the eval site, and paying for aBitwiseOperationLookupbyte-pair range-check the bus no longer needs.Reshaping those columns to u16 directly halves the affected trace width and lets the AIR send the bus message without
pack_u8_block, turning the byte-pair range checks into per-limbVariableRangeCheckerchecks bounded byLIMB_BITS.Subsequent PRs build on the new u16 adapter variants and constants (
INT256_NUM_U16_LIMBS,BLOCK_FE_WIDTH) introduced here.What changes
Cores migrated to u16-shaped traces
Files:
extensions/riscv/circuit/src/{branch_eq,branch_lt,less_than,jalr,jal_lui,auipc,hintstore,rdwrite}/.branch_eq,branch_lt:BranchEqualCoreAir/BranchLessThanCoreAirparameterize onLIMB_BITS. Records are[u16; NUM_LIMBS]. The AIR range-checks each limb viaVariableRangeCheckerBusinstead of byte-pair-checking via the bitwise-lookup bus.less_than: same treatment, wired through the new u16 ALU adapter.jalr,jal_lui,auipc: native u16-limbrs/rd; bitwise lookup wiring removed where it only existed for byte range checks.hintstore:mem_ptr/rem_words/ data buffer all u16-shaped; variable-range checker added.New u16 adapter variants
Files:
extensions/riscv/circuit/src/adapters/alu_u16.rs,extensions/riscv-adapters/src/{vec_heap_u16,eq_mod_u16}.rs.Rv64BaseAluU16Adapter: reads and writes[u16; BLOCK_FE_WIDTH]natively; supports immediates as a single u16 low-limb + sign bit; emits bus cells directly with nopack_u8_block.Rv64VecHeapU16Adapter: u16-cell heap reads/writes; byte strides areBUS_PTR_SCALE * READ_SIZE.Rv64IsEqualModU16Adapter: u16-cell variant for the modularis_eqfamily (used by PR 4's algebra section).The legacy
Rv64BaseAluAdapter/Rv64VecHeapAdapter/Rv64IsEqualModAdapterstay in place for the byte-shaped chips that still need byte-level constraints (ALU / MUL / SHIFT). PR 4 owns reshaping those legacy adapters' register-pointer columns.Shared adapter touched
File:
extensions/riscv-adapters/src/vec_heap_branch.rs.READ_SIZEis nowBLOCK_FE_WIDTH-aware (counts u16 cells, not bytes). The legacy register-pointer columns stay byte-shaped here; PR 4 owns thers_val/rd_valreshape.branch,jalr,rdwriteadapters are wired to deliver the new u16 cores their native shape.Bigint inheritors
Files:
extensions/bigint/circuit/src/{branch_eq,branch_lt,less_than}.rs,extensions/bigint/circuit/src/lib.rs.LessThan256uses the new u16 vec-heap adapter.INT256_NUM_U16_LIMBS = 16,INT256_U16_LIMB_BITS = 16.BranchEqual256,BranchLessThan256use theBLOCK_FE_WIDTH-awarevec_heap_branchadapter with the same u16-limb shape.BaseAlu256,Mul256,Shift256,Sll256stay byte-shaped — their AIRs decompose into byte-level XOR or shift constraints and can't move to u16 here.Wiring
Files:
extensions/riscv/circuit/src/extension/mod.rs,extensions/bigint/circuit/src/extension/mod.rs.PR 2 cores receive the
VariableRangeCheckershared chip and no longer need theBitwiseOperationLookupchip. The extension wiring threadsrange_checkerto the new chips; legacy byte-shaped chips continue to wirebitwise_lu.CUDA parity
Files:
extensions/riscv/circuit/cuda/include/riscv/cores/{beq,blt,less_than}.cuh,extensions/riscv/circuit/cuda/src/{beq,blt,less_than,auipc,hintstore,jal_lui,jalr}.cu,extensions/bigint/circuit/cuda/src/bigint.cu,extensions/riscv-adapters/cuda/include/riscv-adapters/vec_heap_u16.cuh,extensions/riscv/circuit/cuda/include/riscv/adapters/alu_u16.cuh.LIMB_BITSand useVariableRangeChecker.add_countper limb..cukernels instantiate the new shapes and drop the bitwise-lookup parameter from kernel + ABI signatures where the chip no longer needs it.Adapter / helper cleanups
File:
extensions/riscv/circuit/src/adapters/mod.rs.INT256_NUM_U8_LIMBS(= 32) replacesINT256_NUM_LIMBS(the old name lacked the u8 cell-type tag).RV64_LOW32_U16_LIMBS = 2,RV64_LOW32_HIGH_U16_LIMBS = 1,RV64_U16_LIMB_BITS = 16,RV64_U16_LIMB_BASE,RV64_U16_LIMB_MASK,RV64_U16_SIGN_BIT,RV64_LOW32_BITS.rv64_bytes_to_u32_wrapping(low 32 bits of an RV64 register without the upper-half-zero debug assert),expand_to_rv64_block(zero-padNu16 limbs to one bus block).timed_read/timed_writeuseread_bytes/write_bytesto bypassTracingMemory'sBLOCK_FE_WIDTHconst-assert for u8 byte-view dispatch against u16-celled storage.Migration notes
BitwiseOperationLookupChipsolely for byte-pair range checks should now wireVariableRangeCheckerand calladd_count(limb, LIMB_BITS)per limb.Rv64BaseAluAdaptercallers that read all-u16 operands should switch toRv64BaseAluU16Adapter.LessThanCoreAir<NUM_LIMBS, LIMB_BITS>and equivalent now acceptNUM_LIMBS = INT256_NUM_U16_LIMBS = 16,LIMB_BITS = INT256_U16_LIMB_BITS = 16for 256-bit chips operating on u16 limbs.