Skip to content

Batch main sumcheck across chips#1333

Draft
hero78119 wants to merge 47 commits intomasterfrom
feat/batch_main_sumcheck
Draft

Batch main sumcheck across chips#1333
hero78119 wants to merge 47 commits intomasterfrom
feat/batch_main_sumcheck

Conversation

@hero78119
Copy link
Copy Markdown
Collaborator

@hero78119 hero78119 commented Apr 29, 2026

Problem

Main sumcheck was proved and verified per chip, which duplicated transcript work, selector/claim handling, and PCS opening plumbing across chips.

Design Rationale

Use one global batched main sumcheck proof while keeping PCS openings in the existing suffix path. The verifier mirrors the prover transcript order, including ECC bridge sampling before the global combine subset evals challenge, and evaluates frontloaded expressions in the verifier.

Change Highlights

  • ceno_zkvm: batches main constraints into a single global proof path across chip proofs.
  • ceno_zkvm: keeps witness/fixed PCS openings per chip after global main verification.
  • ceno_recursion: mirrors native verifier changes for the batched main proof.
  • ceno-gpu: supports the batched main proving flow.

Benchmark / Performance Impact

Benchmark session compares current PR branch against the frontload baseline on block 23817600, GPU proving, CENO_GPU_ENABLE_WITGEN=0.

E2E / Layer

Metric Baseline This PR Delta
E2E total 75.600s 103.000s +27.400s (+36.2%)
emulator 10.100s 10.300s +0.200s (+2.0%)
app_prove wall time 61.000s 87.400s +26.400s (+43.3%)
app.verify 3.390s 4.040s +0.650s (+19.2%)

App Prove Breakdown

Profiler module totals can overlap because chip proving is concurrent; use app_prove wall time above for end-to-end impact. Corrected parser coverage adds the new batched-main span, which is now the main critical-path regression source.

Operation Baseline This PR Delta
prove_batched_main_constraints 0.000s 27.375s +27.375s (new)
prove_main_constraints 22.622s 0.000s -22.622s (-100.0%)
extract_witness_mles 24.155s 3.760s -20.395s (-84.4%)
build_tower_witness_gpu 3.491s 0.323s -3.168s (-90.7%)
prove_tower_relation_gpu 176.090s 24.008s -152.082s (-86.4%)
pcs_opening 15.246s 15.207s -0.039s (-0.3%)
commit_traces 6.827s 6.814s -0.013s (-0.2%)
parsed rows total 251.118s 78.370s -172.748s (-68.8%)

Benchmark command:

CENO_GPU_ENABLE_WITGEN=0 CENO_CONCURRENT_CHIP_PROVING=1 CENO_GPU_CACHE_LEVEL=0 \
RUSTFLAGS="-C target-feature=+avx2" \
cargo run --features "jemalloc,gpu" --release --bin ceno-reth-benchmark-bin -- \
  --mode prove-app --block-number 23817600 --rpc-url <redacted> \
  --output-dir output --cache-dir rpc-cache

Environment:

Testing

RUST_MIN_STACK=33554432 cargo check --package ceno_recursion --bin e2e_aggregate
RUST_MIN_STACK=33554432 cargo run --release --package ceno_recursion --bin e2e_aggregate -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Also passed the linked GPU e2e benchmark run above.

Risks and Rollout

  • Soundness risk is concentrated in transcript ordering and verifier frontload evaluation; native and recursion verifiers now follow the same global proof flow.
  • Performance is not yet an E2E win in the linked benchmark despite removing per-chip main-constraint cost; further scheduling/host-overlap work is needed before rollout as a performance improvement.

Follow-ups

  • Investigate reducing the new prove_batched_main_constraints critical-path cost.
  • Keep benchmark summaries explicit that parsed module totals overlap and are not a wall-time decomposition.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

@hero78119 hero78119 marked this pull request as draft April 29, 2026 13:52
Base automatically changed from feat/prover_mle_zero_padding to master May 4, 2026 07:55
@hero78119 hero78119 changed the title batch main sumcheck Batch main sumcheck across chips May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant