feat(v2.1-rv64): switch memory bus blocks to u16 cells#2794
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
8cf1d05 to
cfca264
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Atomic flip of the memory bus from 8 u8 cells per block to 4 u16 cells per block. The bus payload shape is now a single hard constant (BLOCK_FE_WIDTH = 4); byte-shaped chips pack their u8 columns into bus cells via pack_u8_block. PR 1 of the memory-bus-u16 split — see pr.md for the full description and migration notes. Trace column narrowing, redundant range-check removal, SHA-2 / Keccak state reshaping, and deferral changes are deferred to follow-up PRs. - Layout constants (arch::config, cuda/params.cuh): single source of truth — U16_CELL_SIZE drives BUS_PTR_SCALE; BLOCK_FE_WIDTH is the design choice; MEMORY_BLOCK_BYTES, BUS_BLOCK_STRIDE, BUS_LEAF_STRIDE, DIGEST_WIDTH derive. Cross-constant equality asserts removed where derivation makes them tautological. - Address-space cell types: RV64_REGISTER_AS, RV64_MEMORY_AS, PUBLIC_VALUES_AS switch to u16-celled; DEFERRAL_AS stays F-celled. - Persistent boundary AIR, merkle tree, merkle public-values, and the CUDA inventory rewritten for 4-cell-per-block u16 leaves (BLOCKS_PER_LEAF = 2). assert_public_values_shape + SystemConfig::public_values_cell_count() formalize the public-values byte→cell contract. - offline_checker::bridge: read/write drop the const N generic; payload is always [T; BLOCK_FE_WIDTH]. write unifies the standard and prev-data-from-columns paths via MemoryWriteAuxInput. New pack_u8_block (AB::Expr) and pack_u8_block_value (F-value) helpers for byte-shaped chips and testing. Dropped read_or_immediate / the ReadOrImmediateOperation flavor (RV32 vestige superseded by RV64_IMM_AS). - TracingMemory / GuestMemory: read_bytes / write_bytes for raw byte access against u16-celled storage; tracing_read_deferral et al. for DEFERRAL_AS. - ~50 chip call-sites across riscv, riscv-adapters, keccak256, sha2, deferral updated so memory-bus sends/receives stay shape-matched. Byte-shaped chips wrap their columns with pack_u8_block; register- read paths still zero-extend the low 32 bits to 8 bytes before packing. Per-AS metered leaf-bit shift in execution_mode/metered/ memory_ctx.rs so RV64 byte pointers produce the right leaf label. - Algebra / ECC adapters: dropped phantom BLOCK_SIZE const generic from ModularAir/Chip, Fp2Air/Chip, WeierstrassAir/Chip and the matching get_*_air / get_*_chip helpers. The parameter became phantom after Rv64VecHeapAdapter* dropped BLOCK_SIZE and was breaking inference at call sites. - Testing: MemoryTester / DeviceMemoryTester accept either N = BLOCK_FE_WIDTH (cell index) or N = MEMORY_BLOCK_BYTES (byte pointer in a u16 AS); pack_u8_block_value shared between testing and the CUDA boundary-chip test that previously hand-rolled the packing. - CUDA parity: params.cuh constants + static_asserts + ASCII diagrams for both u16 and F-celled AS layouts; kernel memory-aux fill paths updated for the new block shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cfca264 to
d30973f
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
Claude finished @shuklaayush's task in 3m 29s —— View job Code Review: Switch memory bus blocks to u16 cells
Overall AssessmentThis is a well-executed, large-scope refactor (100 files, ~4000 additions/deletions) that cleanly switches the memory bus message shape from 8 u8-valued cells to 4 u16-valued cells. The design is sound, the constant derivation is rigorous, and the migration across ~50 chip call-sites is impressively thorough with no correctness issues found. Verdict: Approve — no blocking issues identified. Constants & Layout (config.rs, params.cuh)The constant hierarchy is well-derived and documented: The static assert at The Bridge API Design (bridge.rs)The removal of the The
The three packing helpers ( Byte Range Check Soundness (RISCV chips)The most critical correctness concern: chips operating on individual bytes no longer get byte-range guarantees from the memory bus (which now only guarantees u16 range). All 7 affected RISCV chips properly add byte range checks:
All chips include a consistent comment pattern noting the change. The coverage is complete and correct. Persistent Memory & Boundary AIR (persistent.rs)The for block_idx in 0..BLOCKS_PER_LEAF {
let offset = block_stride_f * AB::F::from_usize(block_idx);
self.memory_bus.send(
MemoryAddress::new(local.address_space,
local.leaf_label * leaf_stride_f + offset),
local.values[block_idx * BLOCK_FE_WIDTH..(block_idx + 1) * BLOCK_FE_WIDTH].to_vec(),
local.timestamps[block_idx],
).eval(builder, local.expand_direction);
}The The initial-row constraint ( Metered Execution (memory_ctx.rs)The leaf-pointer-span differentiation at lines 166-170 is correct: let leaf_ptr_span = if address_space == DEFERRAL_AS {
DIGEST_WIDTH as u32 // cell-pointer AS: 8 cells per leaf
} else {
(U16_CELL_SIZE * DIGEST_WIDTH) as u32 // byte-pointer AS: 16 bytes per leaf
};This ensures byte pointers (RV64 ASes) and cell pointers (DEFERRAL_AS) both produce the correct leaf labels when right-shifted by TracingMemory / GuestMemory (online.rs)The new The Algebra/ECC CleanupThe phantom Deferral Circuit (call/air.rs, output/air.rs, utils.rs)The Testing Infrastructure (testing/memory/mod.rs)The
This ensures test bus messages match what real chips emit. CUDA ChangesThe CUDA boundary ( Minor Observations
No correctness bugs, soundness gaps, or API misuse were found. The PR description, layout diagrams, and migration notes are thorough. |
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: d30973f |
| pub const OPENVM_DEFAULT_INIT_FILE_NAME: &str = "openvm_init.rs"; | ||
| /// Default block size (in bytes) for memory bus interactions. Memory is always read/written a | ||
| /// full block at a time even when the instruction accesses fewer bytes (e.g. RISC-V `lb`/`lh`). | ||
| pub const DEFAULT_BLOCK_SIZE: usize = 8; |
There was a problem hiding this comment.
why did we rename this constant once more T_T
There was a problem hiding this comment.
i got ai to write a base set of constants that make all this easier to understand. it might have gone a bit overboard but i feel it's much cleaner now
| ) -> [AB::Expr; BLOCK_FE_WIDTH] { | ||
| let mut out: [AB::Expr; BLOCK_FE_WIDTH] = std::array::from_fn(|_| AB::Expr::ZERO); | ||
| for i in 0..BLOCK_FE_WIDTH { | ||
| out[i] = data[i * 2].clone() + AB::Expr::from_u64(256) * data[i * 2 + 1].clone(); |
There was a problem hiding this comment.
2 is hard coded which doesn't really make this tied to the BLOCK_FE_WIDTH constant
There was a problem hiding this comment.
huh the 2 comes from the fact that this is packing two u8 elements into one u16. it's independent of BLOCK_FE_WIDTH
stephenh-axiom-xyz
left a comment
There was a problem hiding this comment.
Am good on the deferral extension changes
| // field | ||
| // - T is the exact memory cell type for this address space, satisfying the type requirement | ||
| unsafe { self.memory.read(addr_space, ptr) } | ||
| if size_of::<T>() == 1 { |
There was a problem hiding this comment.
what are the possible values of T here ?
There was a problem hiding this comment.
u8 for the first path and u16 or F for the else
| } | ||
|
|
||
| /// Returns `[pointer:BLOCK_SIZE]_{address_space}` | ||
| /// Reads `BLOCK_SIZE` AS-native cells starting at `cell_idx`. |
There was a problem hiding this comment.
what does AS-native cells mean ?
There was a problem hiding this comment.
address space native i.e. either u16 or F
| @@ -205,11 +207,19 @@ where | |||
| addr_space: u32, | |||
| ptr: u32, | |||
There was a problem hiding this comment.
should this also be renamed as cell_idx too or not ?
There was a problem hiding this comment.
no, because it can also be bytes. maybe it can be ptr_or_idx but let's do it in a follow up and not here since four other prs are stacked on top of this
| // Range check for postimage bytes (pairs) | ||
| for (size_t i = 0; i < KECCAK_WIDTH_BYTES; i += 2) { | ||
| bitwise_lookup.add_range(state.bytes[i], state.bytes[i + 1]); | ||
| } | ||
| // AIR bounds the pointer bytes before composing them. | ||
| bitwise_lookup.add_range(buffer_ptr_limbs[0], buffer_ptr_limbs[1]); | ||
| bitwise_lookup.add_range(buffer_ptr_limbs[2], buffer_ptr_limbs[3]); |
There was a problem hiding this comment.
what was this range check for postage bytes and why is it getting removed now ?
There was a problem hiding this comment.
you needed to decompose the u16 postimage into bytes before sending it to the memory bus earlier. and to show that the decomposition is canonical, you needed to range check that the two limbs are indeed bytes. now that the memory bus is u16 you don't need this
Switch memory bus blocks to u16 cells
PR 1 of the memory-bus-u16 split. Atomic flip of the memory-bus block from
8 u8 cellsto4 u16 cells. The bus interaction is now fixed-shape (BLOCK_FE_WIDTH = 4), and byte-shaped chips pack their u8 columns into bus cells via a single helper.Why
MemoryBusinteractions previously carried 8 field-element-encoded u8s per message. RV64 cell-typed memory (registers, RV memory, public values) is naturally u16-aligned, and packing two bytes per bus cell halves the bus payload width without losing constraint power. This PR makes the bus shape a single hard constant (BLOCK_FE_WIDTH) so the bridge API and downstream chips no longer need aconst Ngeneric for it.Follow-up PRs will use the wider cells to narrow trace columns, prune redundant range checks, reshape SHA-2 / Keccak state, and revisit deferral.
What changes
Memory layout constants
Files:
crates/vm/src/arch/config.rs,crates/vm/cuda/include/system/memory/params.cuh.U16_CELL_SIZE = size_of::<u16>()(intrinsic) drivesBUS_PTR_SCALE;BLOCK_FE_WIDTH = 4is the design choice;MEMORY_BLOCK_BYTES,BUS_BLOCK_STRIDE,BUS_LEAF_STRIDE,DIGEST_WIDTHall derive.DIGEST_WIDTH.is_multiple_of(BLOCK_FE_WIDTH).params.cuhmirrors the constants and includes ASCII diagrams for the u16-celled AS layout (RV64) and the F-celled AS layout (DEFERRAL_AS), showing howbyte_ptr,cell_idx, andbus_ptrrelate.Layout reference
Terminology:
u16for RV64 ASes,FforDEFERRAL_AS).BLOCK_FE_WIDTHcells =MEMORY_BLOCK_BYTESbytes.DIGEST_WIDTHcells; also one merkle leaf.DIGEST_WIDTHcells =BLOCKS_PER_LEAFblocks.u16-celled AS layout (
RV64_REGISTER_AS,RV64_MEMORY_AS,PUBLIC_VALUES_AS)One merkle leaf = 16 bytes = 8 u16 cells = 2 bus blocks:
byte_ptr = U16_CELL_SIZE * cell_idx(coincides withbus_ptrfor u16 ASes)bus_ptr = BUS_PTR_SCALE * cell_idxBUS_BLOCK_STRIDE = 8 = BUS_PTR_SCALE * BLOCK_FE_WIDTHBUS_LEAF_STRIDE = 16 = BUS_PTR_SCALE * DIGEST_WIDTHF-celled AS layout (
DEFERRAL_AS)Each cell holds one
Felement (size_of::<F>()bytes on host; 4 for BabyBear). The bus addresses cells directly, so cell / block / leaf counts andbus_ptr/cell_idxmappings match the u16 AS.byte_ptr = size_of::<F>() * cell_idx(host-memory stride; ≠bus_ptr)bus_ptr = BUS_PTR_SCALE * cell_idx(same formula as u16 ASes)BUS_BLOCK_STRIDEandBUS_LEAF_STRIDEunchanged — the bus addresses cells, not bytes.Address-space cell types
RV64_REGISTER_AS,RV64_MEMORY_AS,PUBLIC_VALUES_ASswitch to u16-celled.DEFERRAL_ASstays F-celled (moves to u16 in a follow-up).Persistent memory and merkle
BLOCKS_PER_LEAF = DIGEST_WIDTH / BLOCK_FE_WIDTH = 2).assert_public_values_shape::<DIGEST_WIDTH>(num_public_values_bytes)andSystemConfig::public_values_cell_count()formalize the public-values byte-count → cell-count contract at construction and at every read site.Offline-checker bridge
File:
crates/vm/src/system/memory/offline_checker/bridge.rs.MemoryBridge::read/writedrop theconst Ngeneric; bus payload is always[T; BLOCK_FE_WIDTH].MemoryWriteAuxInputletswriteaccept either an&MemoryWriteAuxCols<V, BLOCK_FE_WIDTH>(standard case) orfrom_prev_data_exprs(&base_aux, prev_data_expr)for chips that materialize prev_data in their own columns (Keccak, SHA-2, LoadStore).pack_u8_block(AB::Expr) andpack_u8_block_value(F-value form) so byte-shaped chips and testing helpers produce bus messages identical to what u16 chips emit natively.read_or_immediate/MemoryReadOrImmediateOperation(an RV32 pattern superseded byRV64_IMM_AS).TracingMemory / GuestMemory byte view
read_bytes/write_bytes(cell-type-aware, BLOCK-aligned) for raw byte access against u16-celled storage.timed_read_deferral,tracing_read_deferral, …) for the F-celled AS.Chip call-sites
~50 sites across riscv, riscv-adapters, keccak256, sha2, deferral.
memory_bridge.read/.writenow hands the bridge aBLOCK_FE_WIDTH-shaped payload.[T; 8]u8 columns withpack_u8_block.execution_mode/metered/memory_ctx.rs(U16_AS_LEAF_PTR_BITS = 4for byte-pointer ASes,CELL_AS_LEAF_PTR_BITS = 3for cell-pointer ASes), so RV64 byte pointers produce the right leaf label.Algebra / ECC adapter cleanup
No-op semantic; finishes upstream removal.
BLOCK_SIZEconst generic fromModularAir/ModularChip,Fp2Air/Fp2Chip,WeierstrassAir/WeierstrassChipand theirget_*_air/get_*_chiphelpers.Rv64VecHeapAdapter*dropped itsBLOCK_SIZEparameter; keeping it was a phantom that broke inference at call sites.Testing infrastructure
MemoryTester/DeviceMemoryTesteraccept eitherN = BLOCK_FE_WIDTH(cell index) orN = MEMORY_BLOCK_BYTES(byte pointer in a u16 AS) and route accordingly; the byte path packs the result so the dummy chip's bus message matches a byte-shaped chip's exactly.pack_u8_block_valueis shared between the testing helper and the CUDA boundary-chip test that previously hand-rolled the same packing.CUDA
params.cuhconstants + static_asserts kept in sync.Out of scope (follow-up PRs)
DEFERRAL_ASto u16.Migration notes
MemoryBridge::read::<N>/::write::<N>calls: drop<N>; pass aBLOCK_FE_WIDTH-shaped payload. Byte-shaped data goes throughpack_u8_block(constraint side) andpack_u8_block_value(witness side).read_or_immediatecallers: switch toRV64_IMM_AS-based addressing.MemoryWriteAuxCols<V, N>withN != BLOCK_FE_WIDTH: useMemoryWriteAuxInput::from_prev_data_exprs(&base_aux, prev_data)to supply prev_data inline.ModularAir,ModularChip,Fp2Air,Fp2Chip,WeierstrassAir,WeierstrassChipnow take one fewer const generic (BLOCK_SIZEdropped).Resolves INT-7829