fix(soundness): bind ShardRam y-sign to is_global_write by dreamATD · Pull Request #1343 · scroll-tech/ceno

dreamATD · 2026-05-25T00:56:39Z

Problem

Issue #1338 demonstrates a working soundness exploit on master. For the
same RISC-V execution trace, the base verifier and the recursion verifier
both accept two different proof batches whose per-shard public
shard_rw_sum values differ on all 17 shards. The attacker takes an
honest witness, replaces every cross-shard EC accumulator leaf (x, y)
with its inverse (x, -y), updates shard_rw_sum, and reproves.

Root cause: ceno_zkvm/src/tables/shard_ram.rs:276-281 was a TODO. The
host code in ShardRamRecord::to_ec_point uses the sign of y[6] to
encode read vs write, but the circuit only constrained the curve equation
and the EC sum — never tying y[6]'s half-of-field to is_global_write.
Both (x, y) and (x, -y) satisfied every existing check, so the
public summary of cross-shard RAM flow was unbound.

The bug also survived recursion (per the reporter's reproduction).

Design Rationale

Approach mirrors SP1's crates/core/machine/src/operations/global_interaction.rs:210-236
idea but uses Ceno's existing gadgets, not SP1's column layout.
Three pieces:

Offset by +1. Express y[6] in terms of a fresh witness y6_lo
such that y[6] = 0 is never valid in either branch (it is the
2-torsion fixed point where both signs collapse).
Safe band + prover retry. Restrict y6_lo to [0, (p-1)/2). For
the rare exception y[6] = 0 (probability ~1/p ≈ 2^-31 per record)
the host rejects and retries with a new nonce.
Byte-decomposition range check. y6_lo decomposed into four byte
limbs b0..b3 (assert_byte for b0..b2, lookup_ltu_byte(b3, 60, 1)
for b3). For BabyBear, (p-1)/2 = 60·2^24 exactly, so b3 < 60
gives the tightest no-overlap band.

In-circuit branch equality enforced via condition_require_equal:

read (is_global_write = 0): y[6] = y6_lo + 1 ⇒ y[6] ∈ [1, (p-1)/2]
write (is_global_write = 1): y[6] = p - 1 - y6_lo ⇒ y[6] ∈ [(p+1)/2, p-1]

Union covers [1, p-1] with no overlap; y[6] = 0 is the unique
exception, blocked in both branches.

Why not a single AssertLtConfig(y6_lo, (p-1)/2, max_bits=30)?
On BabyBear (p = 0x78000001, 31-bit) a naive AssertLt fails to enforce
canonical < because the diff can wrap mod p. A malicious y6_lo ∈ [0x74000001, p-1] (≈ 2^26 values) produces a wrapped diff that still
fits in 30 bits, so the constraint accepts upper-half values and the
exploit survives. Pre-bounding y6_lo via byte decomposition kills the
wrap. Ceno's DynamicRangeTableCircuit<E, 18> also doesn't carry the
30-bit entries that a single assert_const_range(_, 30) would request.

Why M = 60 (vs SP1's 63). SP1 targets KoalaBear; its (p-1)/2 = 0x3f800000, so 63 leaves a small safety band. For BabyBear,
(p-1)/2 = 60·2^24 exactly — using 63 here would let y[6] straddle
p/2, reintroducing the ambiguity.

Also corrects the stale comment on shard_ram.rs:277-278, which had
the convention reversed (it claimed write ⇒ lower half but the host
code does the opposite).

Change Highlights

ceno_zkvm/src/tables/shard_ram.rs
- ShardRamRecord::to_ec_point: reject y6 == 0 and try the next
  nonce. Same negation policy otherwise.
- ShardRamConfig: new field y6_lo_bytes: [WitIn; 4].
- ShardRamConfig::configure: replace the TODO with the byte
  decomposition, byte-range / LTU lookups, and the
  condition_require_equal branch equality. Updates the stale comment.
- ShardRamCircuit::assign_instance: compute y6_lo from y[6] and
  is_to_write_set, assign byte limbs, register byte and LTU
  multiplicities.
- New regression test test_shard_ram_y_sign_attack_breaks_byte_bound
  that (1) verifies honest points land in the expected half and
  (2) shows that the ShardRam read/write direction is not bound to the public shard_rw_sum #1338 attack would force b3 ≥ 60, making the
  in-circuit LTU lookup unsatisfiable.

Benchmark / Performance Impact

Per ShardRam row this PR adds 4 byte WitIn columns plus 3
byte-range and 1 LTU lookup multiplicity. ShardRam rows scale with
cross-shard RAM events, not with cycles, so the absolute cost is
sub-percent on the prover. No full prover bench was rerun (no
hot-loop arithmetic changed).

Existing test_shard_ram_circuit (170k reads + 1420 writes, full
chip proof) runtime is unchanged within noise on this machine:

master   : ~5.0 s
this PR  : ~5.0 s

Testing

cargo fmt --all --check
cargo check --workspace --all-targets
cargo check --workspace --all-targets --release
cargo make clippy
RUST_MIN_STACK=33554432 cargo test -p ceno_zkvm tables::shard_ram::tests --lib --release

All pass locally on BabyBear. Both test_shard_ram_circuit and the
new test_shard_ram_y_sign_attack_breaks_byte_bound are green.

cargo make tests / cargo make tests_goldilock should be re-run by
CI; the change is feature-gated to BabyBear via the debug_assert_eq!
on MODULUS_U64 and goldilocks does not exercise shard_ram (per
integration.yml commented-out lines and CLAUDE.md).

Risks and Rollout

Soundness. Closes ShardRam read/write direction is not bound to the public shard_rw_sum #1338. The new constraint is hot-loop adjacent
(per shard_ram row) but only adds local byte arithmetic and existing
lookups — no change to transcript, sumcheck, PCS, or EC accumulation.
GPU. The CUDA witness generator
(instructions/gpu/chips/shard_ram.rs → witgen_shard_ram_per_row)
lives in the separate ceno-gpu-mock repo and does not yet populate
the new y6_lo_bytes columns. CPU runs are correct; GPU runs (behind
the gpu feature, not in default CI) will fail at proof time until
the kernel is updated. Follow-up required before re-enabling GPU
shard_ram.
Recursion. The recursive verifier consumes the same constraint
system, so no separate recursion code change is needed; once the
base proof passes, the recursive one will too. The reporter's PoC
going through the recursion verifier should now be rejected.
Field support. Hardcodes the BabyBear constant M = 60. A
debug_assert_eq!(MODULUS_U64, 0x78000001, ...) guards against
accidental use on a different field; shard_ram is BabyBear-only
today per CLAUDE.md.

Follow-ups

Port the y6_lo_bytes population to the ceno-gpu-mock CUDA witgen
kernel and re-enable GPU shard_ram.
The remaining finish the TODOs in shard_ram.rs #1340 TODOs (local read ⇄ global write pairing on
shard_ram.rs:235-236, shard == shard_id binding on line 244)
are intentionally out of scope here. Both should land as separate
reviewable PRs.

Fixes #1338.
Partially addresses #1340.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

The ShardRam circuit derived its EC accumulator points from `(x, y) = lift_x(poseidon2(record))` and used the sign of `y[6]` to encode read vs write direction on the host side, but never enforced that convention in-circuit. A malicious prover could replace every leaf `(x, y)` with `(x, -y)`, update the public `shard_rw_sum` accordingly, and produce a proof that the base verifier and the recursion verifier both accept against the same execution trace. Bind the sign via a fresh `y6_lo` ∈ [0, (p-1)/2), realized as four byte witnesses with the top byte < 60 (`(p-1)/2 = 60·2^24` exactly on BabyBear). The read branch requires `y[6] = y6_lo + 1` and the write branch `y[6] + y6_lo + 1 ≡ 0 (mod p)`. `y[6] = 0` is the unique exception, rejected by `to_ec_point` via nonce retry. The pattern follows SP1's `global_interaction.rs` (offset-by-1 + safe band + prover nonce retry), but uses Ceno's existing `lookup_ltu_byte` gadget instead of SP1's column layout. `M = 60` (vs SP1's 63 for KoalaBear) is the BabyBear-specific tightest no-overlap band. GPU `try_gpu_assign_shard_ram` is not updated in this PR — the new columns aren't populated by the existing CUDA kernel (lives in a separate repo). CPU runs are fixed; GPU runs will fail at proof time until the kernel is updated in a follow-up. Fixes #1338. Partially addresses #1340 (only the y-sign TODO). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

dreamATD · 2026-05-25T01:00:17Z

Closing — opened from the wrong GitHub account. Will reopen under sphere shortly. Branch fix/issue-1338-shard-ram-y-sign is unchanged.

dreamATD closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(soundness): bind ShardRam y-sign to is_global_write#1343

fix(soundness): bind ShardRam y-sign to is_global_write#1343
dreamATD wants to merge 1 commit into
masterfrom
fix/issue-1338-shard-ram-y-sign

dreamATD commented May 25, 2026

Uh oh!

dreamATD commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dreamATD commented May 25, 2026

Problem

Design Rationale

Change Highlights

Benchmark / Performance Impact

Testing

Risks and Rollout

Follow-ups

Copilot Reviewer Directive (keep this section)

Uh oh!

dreamATD commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants