fix(soundness): bind ShardRam y-sign to is_global_write#1343
Closed
dreamATD wants to merge 1 commit into
Closed
Conversation
The ShardRam circuit derived its EC accumulator points from `(x, y) = lift_x(poseidon2(record))` and used the sign of `y[6]` to encode read vs write direction on the host side, but never enforced that convention in-circuit. A malicious prover could replace every leaf `(x, y)` with `(x, -y)`, update the public `shard_rw_sum` accordingly, and produce a proof that the base verifier and the recursion verifier both accept against the same execution trace. Bind the sign via a fresh `y6_lo` ∈ [0, (p-1)/2), realized as four byte witnesses with the top byte < 60 (`(p-1)/2 = 60·2^24` exactly on BabyBear). The read branch requires `y[6] = y6_lo + 1` and the write branch `y[6] + y6_lo + 1 ≡ 0 (mod p)`. `y[6] = 0` is the unique exception, rejected by `to_ec_point` via nonce retry. The pattern follows SP1's `global_interaction.rs` (offset-by-1 + safe band + prover nonce retry), but uses Ceno's existing `lookup_ltu_byte` gadget instead of SP1's column layout. `M = 60` (vs SP1's 63 for KoalaBear) is the BabyBear-specific tightest no-overlap band. GPU `try_gpu_assign_shard_ram` is not updated in this PR — the new columns aren't populated by the existing CUDA kernel (lives in a separate repo). CPU runs are fixed; GPU runs will fail at proof time until the kernel is updated in a follow-up. Fixes #1338. Partially addresses #1340 (only the y-sign TODO). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collaborator
Author
|
Closing — opened from the wrong GitHub account. Will reopen under |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Issue #1338 demonstrates a working soundness exploit on
master. For thesame RISC-V execution trace, the base verifier and the recursion verifier
both accept two different proof batches whose per-shard public
shard_rw_sumvalues differ on all 17 shards. The attacker takes anhonest witness, replaces every cross-shard EC accumulator leaf
(x, y)with its inverse
(x, -y), updatesshard_rw_sum, and reproves.Root cause:
ceno_zkvm/src/tables/shard_ram.rs:276-281was a TODO. Thehost code in
ShardRamRecord::to_ec_pointuses the sign ofy[6]toencode read vs write, but the circuit only constrained the curve equation
and the EC sum — never tying
y[6]'s half-of-field tois_global_write.Both
(x, y)and(x, -y)satisfied every existing check, so thepublic summary of cross-shard RAM flow was unbound.
The bug also survived recursion (per the reporter's reproduction).
Design Rationale
Approach mirrors SP1's
crates/core/machine/src/operations/global_interaction.rs:210-236idea but uses Ceno's existing gadgets, not SP1's column layout.
Three pieces:
y[6]in terms of a fresh witnessy6_losuch that
y[6] = 0is never valid in either branch (it is the2-torsion fixed point where both signs collapse).
y6_loto[0, (p-1)/2). Forthe rare exception
y[6] = 0(probability~1/p ≈ 2^-31per record)the host rejects and retries with a new
nonce.y6_lodecomposed into four bytelimbs
b0..b3(assert_byteforb0..b2,lookup_ltu_byte(b3, 60, 1)for
b3). For BabyBear,(p-1)/2 = 60·2^24exactly, sob3 < 60gives the tightest no-overlap band.
In-circuit branch equality enforced via
condition_require_equal:is_global_write = 0):y[6] = y6_lo + 1⇒y[6] ∈ [1, (p-1)/2]is_global_write = 1):y[6] = p - 1 - y6_lo⇒y[6] ∈ [(p+1)/2, p-1]Union covers
[1, p-1]with no overlap;y[6] = 0is the uniqueexception, blocked in both branches.
Why not a single
AssertLtConfig(y6_lo, (p-1)/2, max_bits=30)?On BabyBear (
p = 0x78000001, 31-bit) a naive AssertLt fails to enforcecanonical
<because the diff can wrap mod p. A maliciousy6_lo ∈ [0x74000001, p-1](≈ 2^26 values) produces a wrapped diff that stillfits in 30 bits, so the constraint accepts upper-half values and the
exploit survives. Pre-bounding
y6_lovia byte decomposition kills thewrap. Ceno's
DynamicRangeTableCircuit<E, 18>also doesn't carry the30-bit entries that a single
assert_const_range(_, 30)would request.Why M = 60 (vs SP1's 63). SP1 targets KoalaBear; its
(p-1)/2 = 0x3f800000, so 63 leaves a small safety band. For BabyBear,(p-1)/2 = 60·2^24exactly — using 63 here would lety[6]straddlep/2, reintroducing the ambiguity.Also corrects the stale comment on
shard_ram.rs:277-278, which hadthe convention reversed (it claimed write ⇒ lower half but the host
code does the opposite).
Change Highlights
ceno_zkvm/src/tables/shard_ram.rsShardRamRecord::to_ec_point: rejecty6 == 0and try the nextnonce. Same negation policy otherwise.ShardRamConfig: new fieldy6_lo_bytes: [WitIn; 4].ShardRamConfig::configure: replace the TODO with the bytedecomposition, byte-range / LTU lookups, and the
condition_require_equalbranch equality. Updates the stale comment.ShardRamCircuit::assign_instance: computey6_lofromy[6]andis_to_write_set, assign byte limbs, register byte and LTUmultiplicities.
test_shard_ram_y_sign_attack_breaks_byte_boundthat (1) verifies honest points land in the expected half and
(2) shows that the ShardRam read/write direction is not bound to the public
shard_rw_sum#1338 attack would forceb3 ≥ 60, making thein-circuit LTU lookup unsatisfiable.
Benchmark / Performance Impact
Per ShardRam row this PR adds 4 byte WitIn columns plus 3
byte-range and 1 LTU lookup multiplicity. ShardRam rows scale with
cross-shard RAM events, not with cycles, so the absolute cost is
sub-percent on the prover. No full prover bench was rerun (no
hot-loop arithmetic changed).
Existing
test_shard_ram_circuit(170k reads + 1420 writes, fullchip proof) runtime is unchanged within noise on this machine:
Testing
cargo fmt --all --check cargo check --workspace --all-targets cargo check --workspace --all-targets --release cargo make clippy RUST_MIN_STACK=33554432 cargo test -p ceno_zkvm tables::shard_ram::tests --lib --releaseAll pass locally on BabyBear. Both
test_shard_ram_circuitand thenew
test_shard_ram_y_sign_attack_breaks_byte_boundare green.cargo make tests/cargo make tests_goldilockshould be re-run byCI; the change is feature-gated to BabyBear via the
debug_assert_eq!on
MODULUS_U64and goldilocks does not exercise shard_ram (perintegration.ymlcommented-out lines and CLAUDE.md).Risks and Rollout
shard_rw_sum#1338. The new constraint is hot-loop adjacent(per shard_ram row) but only adds local byte arithmetic and existing
lookups — no change to transcript, sumcheck, PCS, or EC accumulation.
(
instructions/gpu/chips/shard_ram.rs→witgen_shard_ram_per_row)lives in the separate
ceno-gpu-mockrepo and does not yet populatethe new
y6_lo_bytescolumns. CPU runs are correct; GPU runs (behindthe
gpufeature, not in default CI) will fail at proof time untilthe kernel is updated. Follow-up required before re-enabling GPU
shard_ram.
system, so no separate recursion code change is needed; once the
base proof passes, the recursive one will too. The reporter's PoC
going through the recursion verifier should now be rejected.
M = 60. Adebug_assert_eq!(MODULUS_U64, 0x78000001, ...)guards againstaccidental use on a different field; shard_ram is BabyBear-only
today per CLAUDE.md.
Follow-ups
ceno-gpu-mockCUDA witgenkernel and re-enable GPU shard_ram.
shard_ram.rs#1340 TODOs (local read ⇄ global writepairing onshard_ram.rs:235-236,shard == shard_idbinding on line 244)are intentionally out of scope here. Both should land as separate
reviewable PRs.
Fixes #1338.
Partially addresses #1340.
Copilot Reviewer Directive (keep this section)
When Copilot reviews this PR, apply
.github/copilot-instructions.mdstrictly.