Skip to content

fix(soundness): bind ShardRam y-sign to is_global_write#1343

Closed
dreamATD wants to merge 1 commit into
masterfrom
fix/issue-1338-shard-ram-y-sign
Closed

fix(soundness): bind ShardRam y-sign to is_global_write#1343
dreamATD wants to merge 1 commit into
masterfrom
fix/issue-1338-shard-ram-y-sign

Conversation

@dreamATD
Copy link
Copy Markdown
Collaborator

Problem

Issue #1338 demonstrates a working soundness exploit on master. For the
same RISC-V execution trace, the base verifier and the recursion verifier
both accept two different proof batches whose per-shard public
shard_rw_sum values differ on all 17 shards. The attacker takes an
honest witness, replaces every cross-shard EC accumulator leaf (x, y)
with its inverse (x, -y), updates shard_rw_sum, and reproves.

Root cause: ceno_zkvm/src/tables/shard_ram.rs:276-281 was a TODO. The
host code in ShardRamRecord::to_ec_point uses the sign of y[6] to
encode read vs write, but the circuit only constrained the curve equation
and the EC sum — never tying y[6]'s half-of-field to is_global_write.
Both (x, y) and (x, -y) satisfied every existing check, so the
public summary of cross-shard RAM flow was unbound.

The bug also survived recursion (per the reporter's reproduction).

Design Rationale

Approach mirrors SP1's crates/core/machine/src/operations/global_interaction.rs:210-236
idea but uses Ceno's existing gadgets, not SP1's column layout.
Three pieces:

  1. Offset by +1. Express y[6] in terms of a fresh witness y6_lo
    such that y[6] = 0 is never valid in either branch (it is the
    2-torsion fixed point where both signs collapse).
  2. Safe band + prover retry. Restrict y6_lo to [0, (p-1)/2). For
    the rare exception y[6] = 0 (probability ~1/p ≈ 2^-31 per record)
    the host rejects and retries with a new nonce.
  3. Byte-decomposition range check. y6_lo decomposed into four byte
    limbs b0..b3 (assert_byte for b0..b2, lookup_ltu_byte(b3, 60, 1)
    for b3). For BabyBear, (p-1)/2 = 60·2^24 exactly, so b3 < 60
    gives the tightest no-overlap band.

In-circuit branch equality enforced via condition_require_equal:

  • read (is_global_write = 0): y[6] = y6_lo + 1y[6] ∈ [1, (p-1)/2]
  • write (is_global_write = 1): y[6] = p - 1 - y6_loy[6] ∈ [(p+1)/2, p-1]

Union covers [1, p-1] with no overlap; y[6] = 0 is the unique
exception, blocked in both branches.

Why not a single AssertLtConfig(y6_lo, (p-1)/2, max_bits=30)?
On BabyBear (p = 0x78000001, 31-bit) a naive AssertLt fails to enforce
canonical < because the diff can wrap mod p. A malicious y6_lo ∈ [0x74000001, p-1] (≈ 2^26 values) produces a wrapped diff that still
fits in 30 bits, so the constraint accepts upper-half values and the
exploit survives. Pre-bounding y6_lo via byte decomposition kills the
wrap. Ceno's DynamicRangeTableCircuit<E, 18> also doesn't carry the
30-bit entries that a single assert_const_range(_, 30) would request.

Why M = 60 (vs SP1's 63). SP1 targets KoalaBear; its (p-1)/2 = 0x3f800000, so 63 leaves a small safety band. For BabyBear,
(p-1)/2 = 60·2^24 exactly — using 63 here would let y[6] straddle
p/2, reintroducing the ambiguity.

Also corrects the stale comment on shard_ram.rs:277-278, which had
the convention reversed (it claimed write ⇒ lower half but the host
code does the opposite).

Change Highlights

  • ceno_zkvm/src/tables/shard_ram.rs
    • ShardRamRecord::to_ec_point: reject y6 == 0 and try the next
      nonce. Same negation policy otherwise.
    • ShardRamConfig: new field y6_lo_bytes: [WitIn; 4].
    • ShardRamConfig::configure: replace the TODO with the byte
      decomposition, byte-range / LTU lookups, and the
      condition_require_equal branch equality. Updates the stale comment.
    • ShardRamCircuit::assign_instance: compute y6_lo from y[6] and
      is_to_write_set, assign byte limbs, register byte and LTU
      multiplicities.
    • New regression test test_shard_ram_y_sign_attack_breaks_byte_bound
      that (1) verifies honest points land in the expected half and
      (2) shows that the ShardRam read/write direction is not bound to the public shard_rw_sum #1338 attack would force b3 ≥ 60, making the
      in-circuit LTU lookup unsatisfiable.

Benchmark / Performance Impact

Per ShardRam row this PR adds 4 byte WitIn columns plus 3
byte-range and 1 LTU lookup multiplicity. ShardRam rows scale with
cross-shard RAM events, not with cycles, so the absolute cost is
sub-percent on the prover. No full prover bench was rerun (no
hot-loop arithmetic changed).

Existing test_shard_ram_circuit (170k reads + 1420 writes, full
chip proof) runtime is unchanged within noise on this machine:

master   : ~5.0 s
this PR  : ~5.0 s

Testing

cargo fmt --all --check
cargo check --workspace --all-targets
cargo check --workspace --all-targets --release
cargo make clippy
RUST_MIN_STACK=33554432 cargo test -p ceno_zkvm tables::shard_ram::tests --lib --release

All pass locally on BabyBear. Both test_shard_ram_circuit and the
new test_shard_ram_y_sign_attack_breaks_byte_bound are green.

cargo make tests / cargo make tests_goldilock should be re-run by
CI; the change is feature-gated to BabyBear via the debug_assert_eq!
on MODULUS_U64 and goldilocks does not exercise shard_ram (per
integration.yml commented-out lines and CLAUDE.md).

Risks and Rollout

  • Soundness. Closes ShardRam read/write direction is not bound to the public shard_rw_sum #1338. The new constraint is hot-loop adjacent
    (per shard_ram row) but only adds local byte arithmetic and existing
    lookups — no change to transcript, sumcheck, PCS, or EC accumulation.
  • GPU. The CUDA witness generator
    (instructions/gpu/chips/shard_ram.rswitgen_shard_ram_per_row)
    lives in the separate ceno-gpu-mock repo and does not yet populate
    the new y6_lo_bytes columns. CPU runs are correct; GPU runs (behind
    the gpu feature, not in default CI) will fail at proof time until
    the kernel is updated. Follow-up required before re-enabling GPU
    shard_ram.
  • Recursion. The recursive verifier consumes the same constraint
    system, so no separate recursion code change is needed; once the
    base proof passes, the recursive one will too. The reporter's PoC
    going through the recursion verifier should now be rejected.
  • Field support. Hardcodes the BabyBear constant M = 60. A
    debug_assert_eq!(MODULUS_U64, 0x78000001, ...) guards against
    accidental use on a different field; shard_ram is BabyBear-only
    today per CLAUDE.md.

Follow-ups

  • Port the y6_lo_bytes population to the ceno-gpu-mock CUDA witgen
    kernel and re-enable GPU shard_ram.
  • The remaining finish the TODOs in shard_ram.rs #1340 TODOs (local read ⇄ global write pairing on
    shard_ram.rs:235-236, shard == shard_id binding on line 244)
    are intentionally out of scope here. Both should land as separate
    reviewable PRs.

Fixes #1338.
Partially addresses #1340.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

The ShardRam circuit derived its EC accumulator points from
`(x, y) = lift_x(poseidon2(record))` and used the sign of `y[6]` to
encode read vs write direction on the host side, but never enforced
that convention in-circuit. A malicious prover could replace every
leaf `(x, y)` with `(x, -y)`, update the public `shard_rw_sum`
accordingly, and produce a proof that the base verifier and the
recursion verifier both accept against the same execution trace.

Bind the sign via a fresh `y6_lo` ∈ [0, (p-1)/2), realized as four
byte witnesses with the top byte < 60 (`(p-1)/2 = 60·2^24` exactly
on BabyBear). The read branch requires `y[6] = y6_lo + 1` and the
write branch `y[6] + y6_lo + 1 ≡ 0 (mod p)`. `y[6] = 0` is the unique
exception, rejected by `to_ec_point` via nonce retry.

The pattern follows SP1's `global_interaction.rs` (offset-by-1 + safe
band + prover nonce retry), but uses Ceno's existing `lookup_ltu_byte`
gadget instead of SP1's column layout. `M = 60` (vs SP1's 63 for
KoalaBear) is the BabyBear-specific tightest no-overlap band.

GPU `try_gpu_assign_shard_ram` is not updated in this PR — the new
columns aren't populated by the existing CUDA kernel (lives in a
separate repo). CPU runs are fixed; GPU runs will fail at proof
time until the kernel is updated in a follow-up.

Fixes #1338.
Partially addresses #1340 (only the y-sign TODO).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dreamATD
Copy link
Copy Markdown
Collaborator Author

Closing — opened from the wrong GitHub account. Will reopen under sphere shortly. Branch fix/issue-1338-shard-ram-y-sign is unchanged.

@dreamATD dreamATD closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ShardRam read/write direction is not bound to the public shard_rw_sum

2 participants