test(DON'T MERGE): develop-v2.0.0-rc.1#2592
Conversation
e9d2afb to
deff2f2
Compare
a6c7835 to
8d51084
Compare
8d51084 to
743b94e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3b2b382 to
4911c15
Compare
4911c15 to
cf270ee
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
71d3584 to
f9268e6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
271868d to
7a23ceb
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
d4719da to
05f4de6
Compare
05f4de6 to
a0ee68f
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
2ea2f43 to
f2c28ec
Compare
Resolves INT-7422.
Resolves INT-7433.
## Summary
Refactors the SHA-2 block hasher CUDA tracegen kernel from a monolithic
1-thread-per-block design into a three-phase pipeline for better memory
coalescing on column-major traces.
| Phase | Kernel | Parallelism | Purpose |
|-------|--------------------------|-----------------|----------------------------------------------------------------|
| 1 | `sha2_first_pass_phase1` | 1 thread/block | Compute SHA-2 rounds,
snapshot `{a..h, w_buf}` to scratch |
| 2 | `sha2_first_pass_phase2` | 1 thread/row | Read snapshot, write
trace columns (coalesced) |
| 3 | `sha2_first_pass_phase3` | 1 thread/block | Cross-row dependencies
(`intermed_4/8/12`, carry) |
Also drops unused parameters (`ptr_max_bits`,
`range_checker_ptr`/`num_bins`, `timestamp_max_bits`) from the block
hasher kernel and its Rust-side wrappers, simplifying
`Sha2BlockHasherChipGpu`.
## Changes
- `extensions/sha2/circuit/cuda/src/sha2_hasher.cu`: split first-pass
tracegen into three phases; tightened `CHECK_KERNEL` error propagation
in `launch_sha2_second_pass_dependencies`.
- `extensions/sha2/circuit/src/cuda/{mod.rs,cuda_abi.rs}`: wire the new
scratch buffer and drop unused wrapper arguments.
- `extensions/sha2/circuit/src/extension/cuda.rs`: update
`Sha2BlockHasherChipGpu::new` call sites.
- `extensions/sha2/circuit/src/sha2_chips/tests.rs`: update tests to
match the new constructor signature.
## Correctness notes
- Phase 1 snapshots state before each row's rounds; Phase 2 restores
that snapshot and replays the rounds to compute intermediate trace
values. The round computations are identical across phases.
- Scratch is sized as `num_blocks * ROWS_PER_BLOCK * (8 + BLOCK_WORDS)`
words, exactly matching what's written/read (no oversize).
- Phase 2 launches `num_blocks * ROWS_PER_BLOCK` threads, so every
thread maps to a valid block.
- Padding rows are still handled by the existing
`sha2_fill_first_dummy_row` + `sha2_fill_invalid_rows` kernels.
- Digest row: Phase 1 snapshots the final state (after all rounds) for
`row_in_block == ROUND_ROWS`; Phase 2 reads it to compute `final_hash =
prev_hash + work_vars`.
## Test plan
- [ ] `cargo nextest run --cargo-profile=fast -p openvm-sha2-circuit`
- [ ] CUDA tracegen tests for both SHA-256 and SHA-512 variants (feature
`cuda`)
- [ ] Verify trace matrices match CPU reference for representative
inputs
resolves int-6985
Towards INT-7429.
## Summary
- Mark `finalize_ptr`, `finalize`, and `native_keccak256` as `unsafe fn`
in both `zkvm_impl.rs` and `host_impl.rs`. These functions write 32
bytes to an output pointer/slice via `copy_nonoverlapping` without
runtime size checks (only `debug_assert!`), so callers must guarantee
the buffer is at least 32 bytes.
- Add `assert!` in the `tiny_keccak::Hasher` trait impl in `lib.rs`
since the trait method must be safe and cannot propagate `unsafe` to
callers.
- Safe public APIs (`keccak256()`, `set_keccak256()`) are unchanged —
they pass correctly-sized buffers internally.
**TODO:** The `openvm-org/tiny-keccak` fork and
`openvm-org/openvm-primitives` also call `Keccak256::finalize` /
`native_keccak256` and will need matching updates (add `unsafe {}`
blocks and/or `assert!` at their trait boundaries).
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Wang <jonathanpwang@users.noreply.github.com>
Towards INT-7422.
Resolves INT-7439, INT-7422.
…ebra+ecc + update sha2 docs (#2710) Resolves INT-7439.
Towards INT-7429. ## Summary - Fix OOB read in `native_xorin` when `len` is not 4-byte aligned: `AlignedBuf::new` copied `adjusted_len` bytes from source buffers, reading 1-3 bytes past the valid region. Replace with `AlignedBuf::uninit` + `copy_nonoverlapping` of only `len` bytes. - Mark `native_xorin` and `native_keccakf` as `unsafe extern "C" fn` since they take raw pointers and require callers to guarantee pointer validity and buffer sizes. Add `unsafe` blocks in callers `xorin` and `keccakf` with safety comments. - Update keccak extension docs code example to reflect the new unsafe signatures. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Towards INT-7478. - Replace `assert_eq!` with `Err(StaticProgramError::InvalidInstruction)` in multiplication executor `pre_compute_impl` for both bigint (mult.rs) and rv32im (mul/execution.rs). Functions returning `Result` should use error returns instead of panics, consistent with other executors in the codebase. - Add `/// # Safety` doc comments to all 12 `unsafe extern "C" fn` in bigint guest externs (externs.rs), documenting pointer validity, buffer size, and alignment preconditions. Follows precedent set by PRs #2706 (keccak) and #2708 (sha2). - Document unused branch opcode variants. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Issue found and initially corrected by [Daniel Heim](https://github.com/dghelm)
Resolves INT-7541. Issue found and initially corrected by [Daniel Heim](https://github.com/dghelm).
This PR introduces the `ColumnsAir` trait and makes it required by all AIRs. Via this trait, AIRs can (optionally) provide a list of column names. At powdr, we use this to generate (somewhat) human-readable autoprecompiles for OpenVM ([example](https://github.com/powdr-labs/powdr/blob/main/openvm-riscv/tests/apc_snapshots/complex/rotate.txt)). Unfortunately, this touches all AIRs and therefore a lot of files, but the changes are small and now change to `stark-backend` are needed. This change represents most of our [diff to OpenVM](https://github.com/openvm-org/openvm/compare/v2.0.0-beta.2...powdr-labs:openvm:v2-powdr-beta.2-reorganized?expand=1). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The metered segmentation memory formula hardcodes `main_cell_secondary_weight = 1.5`, which the doc comment on `DEFAULT_MAIN_CELL_SECONDARY_WEIGHT` derives assuming `need_rot = true` for every AIR (`mat_eval_bytes = (h / 2^l_skip) × (2 × w) × sizeof(EF)`). For AIRs with `need_rot = false`, `fold_ple` allocates `1 × w`, not `2 × w`, so their secondary contribution is half. This change threads per-AIR `need_rot` from `pk.vk.params` into `SegmentationCtx` and splits the secondary term: AIRs with `need_rot = true` use weight `1.5`, AIRs with `need_rot = false` use `0.75`. Main and interaction terms unchanged. [benchmark comparison](https://github.com/axiom-crypto/openvm-eth/actions/runs/25931043565) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is a manual re-application of 5b580ad to the new `Sha2BlockHasherSubAir` since it is in a new file. A negative test is added that fails without the new added constraint. ref INT-6416
f08f0fe to
0110850
Compare
|
Claude finished @jpw-axiom's task in 6m 59s —— View job Code Review for PR #2592Scope: 105 commits, 100 files, +37,627/−40,110 lines. All changes reviewed against
New Commits Since Prior Reviews (24 commits)🔴 Critical Fixes — All Verified Correct
Performance Optimizations — Verified Sound
Other Noteworthy Commits
Outstanding Issues (Carried Forward + New)🔴 High Priority1. Commit
Impact: The example is not in the workspace members, so it won't break workspace builds, but it will fail if built standalone. The docs page is stale. 🟡 Medium Priority2. ColsRef derive macro — copy-paste bug in panic message
panic!(
"Unsupported field type (in get_mut_cols_ref_fields): {:?}",
// ^^^ Should be "get_const_cols_ref_fields"
f.ty
);Line 389 in the same function correctly says 3. Latent UB in
template <> inline __device__ __host__ uint64_t rotr_generic<uint64_t>(uint64_t value, int n) {
return (value >> n) | (value << (64 - n)); // UB when n=0
}Same issue in 4. Unsafe pointer casts without alignment verification in ColsRef macro
5. Hardcoded
🟢 Low Priority6. ColsRef macro code duplication (~300 lines)
7. ColsRef example test has no assertions
8.
Areas Verified Sound
SummaryThis is a well-executed release candidate PR with significant architectural improvements. The newest commits include several critical safety and soundness fixes that are all correctly implemented:
The CUDA stream migration (~354 files) is a large infrastructure change that enables multi-stream execution, consistently applied across all extensions. Recommended actions before merge (priority order):
|
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 0110850 |
…2785) Adds `release: types: [released]` so a non-prerelease, non-draft release publish triggers both workflows, using the release tag as the checkout ref and version. `workflow_dispatch` remains supported for manual runs. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 5ddd539 |
Need these changes so axiom-crypto/openvm-eth#609 becomes possible Resolves INT-7744 --------- Co-authored-by: Allan Lin <allanl@intrinsictech.xyz>
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: 5ad6253 |
Co-authored-by: Allan Lin <allanl@intrinsictech.xyz>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights. Commit: b8194ef |
No description provided.