Skip to content

IOR trait#22

Closed
z-tech wants to merge 22 commits into
mainfrom
z-tech/general-cleanup
Closed

IOR trait#22
z-tech wants to merge 22 commits into
mainfrom
z-tech/general-cleanup

Conversation

@z-tech
Copy link
Copy Markdown
Collaborator

@z-tech z-tech commented Mar 25, 2026

What does this PR do?

  • optimizations from efficient-sumcheck #96 includes vectorization for Goldilocks (and 2,3 degree extensions) for Neon and AVX-512
  • communication reduction by one message using the updated API for both Warp sumcheck
  • flat coefficient buffers for protogalaxy::fold eliminates heap allocations (huge win)

Profiled breakdown on Goldilocks field (hash chain R1CS, n=4096):

Phase Before After Change
rs_encode (FFT) ~2.5ms ~2.4ms
pesat_merkle_tree ~1.5ms ~1.5ms
twin_constraint_sumcheck ~7.2ms ~2.6ms −64%
eval_bundled_r1cs ~0.6ms ~0.7ms
merkle_commit ~1.0ms ~1.0ms
eq_poly_evals + ood_evals_vec ~3.9ms ~4.0ms
batching_sumcheck (inner product) ~0.06ms ~0.06ms
End-to-end prover ~16.9ms ~13.0ms −23%

@z-tech

This comment was marked as outdated.

@z-tech

This comment was marked as outdated.

@z-tech
Copy link
Copy Markdown
Collaborator Author

z-tech commented Apr 9, 2026

NEON: 65% reduction in twin_constraint_sumcheck achieves 30% reduction in overall protocol time.

Phase Original Current Change
rs_encode (FFT) ~2.5ms ~2.4ms
pesat_merkle_tree ~1.5ms ~1.5ms
twin_constraint_sumcheck ~7.2ms ~2.6ms -64%
eval_bundled_r1cs ~0.6ms ~0.7ms
merkle_commit ~1.0ms ~1.1ms
eq_poly_evals + ood_evals_vec ~3.9ms ~4.3ms
batching_sumcheck (inner product) ~0.06ms ~0.06ms
Total (constrained_code_accumulate) ~12.9ms ~9.4ms -27%
Total (incl. pesat_reduce) ~16.9ms ~13.3ms -21%

@z-tech z-tech changed the title Z tech/general cleanup Cleanup and pull optimizations from Efficient Sumcheck Apr 10, 2026
z-tech added 10 commits April 18, 2026 13:40
Extracts WARP::prove and ::verify into per-phase modules under
src/protocol/phases/ (pesat, twin_constraint, ood, batching, proximity),
with a shared Oracle<F> type in src/protocol/oracle.rs that owns the
codeword and a lazily-materialised multilinear extension. lib.rs is
now an orchestrator that threads oracles and accumulator state through
the phases.

Adds tracing spans at each phase boundary, gated behind a new `profile`
cargo feature that optionally pulls in tracing-subscriber via
src/profile.rs. Release builds do not pull tracing-subscriber as a
direct dep of warp.

Sets up docs/paper-mods/ as a living spec of notation modifications to
the Warp paper, paired 1:1 with Rust modules. mod1_oracle.tex authored
in full; mod2/3/4 stubbed for downstream plans. Framing stays inside
IOP/BCS rather than moving to AHP.

No behavior change: 14/14 tests pass (BLS12-381 + Goldilocks warp_test,
query + relation tests), clippy clean under --all-features.
Restructures src/profile.rs into a module directory with four parts,
all gated behind the existing `profile` cargo feature:

- counters: thread-local Cell<u64> counters with a `count_ops!` macro.
  Snapshots + deltas let a subscriber diff across a span's lifetime.
  Tracks coarse call-site events (MerkleTreeBuilds, MerklePathsGenerated,
  MleMaterializations, OracleLeafQueries, OraclePointQueries,
  TwinConstraintRounds, BatchingRounds, OodPointQueries, EncodeCalls,
  MerklePathsVerified). Field-level op counts are out of scope — they
  need an F newtype or an arkworks fork; deferred.

- timing: `thread_cpu_ns()` via clock_gettime(CLOCK_THREAD_CPUTIME_ID)
  on Linux and macOS. Distinguishes blocked-on-IO from blocked-on-compute
  in a way wall-time cannot.

- rss: `peak_rss_bytes()` via getrusage(RUSAGE_SELF), normalising the
  Linux-kB vs macOS-bytes discrepancy.

- layer: a tracing `Layer` that captures a span's counters + timing +
  rss on enter, differences them on close, and emits one newline-
  delimited JSON record per span. Schema tag `warp.profile.v1`;
  per-record fields are {phase, wall_ns, cpu_ns, rss_delta_bytes,
  counters, dimensions}. Dimensions come from span numeric fields
  (log_l, log_m, log_n, etc.), captured via a tracing Visit.

Phase modules and the Oracle are instrumented at call sites so the JSON
records carry meaningful counter deltas. Without the feature every
count_ops! call and every timing/rss wrapper compiles to a no-op.

tests/profile_json.rs is a feature-gated integration test that installs
the JSON layer against an in-memory sink, runs a full hashchain prove,
and asserts every phase emits a well-formed record. It's the reference
shape Plan B's regression detector will consume.

Verification matrix (all green):
  - cargo test (no features)        : 14/14
  - cargo test --features profile   : 14/14 + profile_json 1/1
  - cargo clippy --all-targets      : clean
  - cargo clippy --all-targets --all-features : clean
Adds a deterministic instruction-count bench for WARP::prove, intended
as the regression signal for a future CI gate. Complements the existing
criterion wall-time bench (which stays informational).

- benches/iai_phases.rs: one library_benchmark for `prove` at the
  unit-test parameter shape (l1=4, s=2, t=7, hashchain=10). v1 scope
  is intentionally narrow — the docstring explains why the parameterised
  `#[bench::case(setup = ...)]` form didn't compile under iai-callgrind
  0.14 from this crate's bench root. Plumbing more sizes is deferred.

- Cargo.toml: iai-callgrind 0.14 as a dev-dep, second `[[bench]]`
  entry wired with `harness = false`.

- benches/docker/Dockerfile.iai: slim Debian image with valgrind + the
  iai-callgrind-runner binary pre-installed, for macOS hosts where
  valgrind has been unsupported since Big Sur.

- Makefile: `bench-wall` (criterion, any host), `bench-ci` (iai
  natively, Linux/valgrind required), `bench-ci-local` (builds and
  runs the Docker image with cargo registry/git/target caches mounted
  from target/iai-docker-cache/ so arkworks isn't redownloaded each
  run). Plus `test` and `clippy` convenience targets.

- benches/README.md: explains the split (noisy wall time vs
  deterministic instruction count), installation, Docker pathway, and
  v1 scope limits.

Deferred (tracked in the plan): multiple parameter points, baseline
capture + commit, and the GitHub Actions workflow that gates PRs.
Those come as small separate commits so the CI change can be reviewed
on its own.

Verification:
  - cargo check / --features profile    : clean
  - cargo build --bench iai_phases      : clean (run needs valgrind)
  - cargo build --bench warp_rs         : clean
  - cargo clippy --all-features         : clean
  - cargo test                          : 14/14 (unchanged)
Adds src/params/ — given (λ, |F|, code rate, list-decoding regime),
pick the minimum (s, t) that achieves λ bits of soundness on a
Reed–Solomon code.

- types: Regime::{Provable, Conjectured}, SecurityLevel, Params,
  SoundnessBound, ParamError. SoundnessBound::meets(λ) answers the
  "is this enough?" question directly.

- select(λ, field_bits, code_rate, regime) → Params. Uses the
  Johnson-bound proximity-query formula (provable: t ≥ 2λ/log₂(1/ρ),
  conjectured: t ≥ λ/log₂(1/ρ)), with a field-admissibility check
  (log₂|F| ≥ λ + 40) to ensure polylog noise terms are negligible.

- validate(params, field_bits, code_rate, regime, target): the
  inverse — computes the achieved soundness and reports per-term
  admissibility so callers can see partial failures.

- presets::PRESETS: canonical (λ, rate, regime) → (s, t) rows for
  80/128-bit targets at common rates (1/2, 1/8). A test enforces
  that every row matches `select` output, so drift between the
  table and the formulas is caught at build time.

- src/bin/warp-params.rs: dependency-free CLI with `select`,
  `validate`, and `table` subcommands. Dumps PRESETS as TSV and
  exits non-zero if a validate call doesn't meet the target.

- docs/paper-mods/mod4_parameter_selection.tex: replaces the stub
  with the full derivation, citing STIR / WHIR for the proximity-
  gap bounds and marking the deferred items (batching-sumcheck
  calibration of s, non-RS codes, a reference table with matching
  proofs).

The hard-coded `s=8, t=7` in warp_test is intentionally unchanged
for this pass — those are functional-test shapes, not security
values. A later pass can thread PRESETS through callers that care
about real targets.

Verification:
  - 23 tests pass (14 original + 9 new in params::tests)
  - cargo clippy, both feature configs: clean
  - ./target/debug/warp-params table prints PRESETS; select / validate
    round-trip; validate correctly rejects insufficient (s, t)
Adds the two highest-value pieces of test hardening from the plan;
the rest (proptest, golden serialization, xtask ref lint, runtime
F-S harness) is deferred and tracked in the todo list.

tests/verifier_negative.rs
- Shared `make_fixture()` runs the full two-phase accumulation so l2 > 0
  and every cleanly-reachable `VerifierError` variant is triggerable.
- One test per tamperable field, each confirming the verifier rejects
  the proof with the *specific* expected error (not just "some error"):
    * CodeEvaluationPoint (α tamper)
    * CircuitEvaluationPoint (β.0 and β.1 tampers — two tests)
    * NumShiftQueries (truncate shift_query_answers)
    * ShiftQueryIndex (swap auth_0 paths)
    * ShiftQuery (tamper shift_query_answer value)
    * NumL2Instances (truncate auth_j)
    * Target (tamper μ)
- A happy-path test keeps the fixture honest.
- Docstring enumerates the variants we did NOT reach through
  single-field tampering (SpongeFish/ArkError wrap lower-level errors;
  NumSumcheckRounds is transcript-derived; SumcheckRound is unraised
  in current code).

docs/audits/fiat_shamir.md
- Ordered, line-for-line mapping of every prover-side transcript write
  to its verifier-side read. 25 steps, each with file:line links on
  both sides.
- A "what reviewers should spot-check" section calls out the specific
  squeeze-before-absorb patterns F-S-soundness bugs tend to take.
- Scope: this is a **manual** audit (compensating control). The
  runtime ordering harness that would make drift undetectable at CI
  time is deferred because it requires instrumenting spongefish; noted
  explicitly at the bottom of the doc.

Verification:
  - 23/23 unit tests + 9/9 negative-path tests pass
  - cargo clippy, both feature configs: clean
Five small fixups called out during the Plan T post-mortem. No
behaviour changes — all tests keep passing on both feature configs.

1. Move the big `warp_test` / `warp_test_goldilocks` suites from
   `src/lib.rs` into `tests/integration_warp.rs`. They're end-to-end
   prove / verify / decide runs; keeping them as inline unit tests
   kept the file ~340 lines longer than necessary. `src/lib.rs` is
   now 500 lines (down from 844); the test content is unchanged.

2. Normalise phase-fn visibility to `pub`. Before, `pesat::prove`
   was `pub(crate)` while the other four phases' `prove` / `verify`
   / `verify_claim` entry points were `pub`. All output structs
   (`Oracle`, `OodOutput`, `BatchingOutput`, etc.) are already `pub`,
   so there was no reason for `pesat` to be the odd one out.
   `PesatOutput` promoted from `pub(crate)` → `pub` for the same
   reason.

3. Cross-reference lint audit — verified every module in `src/params`,
   `src/protocol/phases`, plus `src/protocol/oracle.rs` and
   `src/bin/warp-params.rs`, carries a doc-comment reference to its
   paired `docs/paper-mods/modN_*.tex`. No drift.

4. `presets::lookup` now takes an exact `(num, den)` pair instead of
   `f64` with an epsilon-compare. Preserves the caller's intent
   (`1/2` is distinct from `0.5`); the CLI plumbs a small `Rate` enum
   through parsing so ratios hit the preset table while bare decimals
   fall through to the computed-value branch.

5. Remove the `let _ = Counter::ALL.len()` silencer in
   `profile/layer.rs`. The import it was suppressing was an artifact
   of an earlier iteration and isn't needed now; dropping it and the
   `Counter` import lets clippy stay clean.

Verification:
  - cargo test                       : 33/33 pass (22 unit + 2 integ
    + 9 negative)
  - cargo test --features profile    : 34/34 pass (adds profile_json)
  - cargo clippy --all-features -Dwarnings : clean
  - warp-params select verified with both `1/2` and `0.5` inputs
Captures the changelog entries that were uncommitted in the working
tree at the start of the Plan 0 / O / B / P / T refactor session.
Describes the already-landed sumcheck work in the preceding five
commits (a66f122 .. 312e220): inner-product sumcheck 2-coefficient
round messages, RoundPolyEvaluator adoption, twin-constraint
coefficient count reduction, and the ark-ff rev pin.
Covers every personal Claude Code artefact (custom agents, plans,
skills, settings.json, settings.local.json) for this repo — not just
the local-settings file that was already ignored. Nothing under
.claude/ has ever been tracked here, so this is a purely
belt-and-suspenders tightening: the next `git add .` can't pick up
anything from the directory.
@z-tech z-tech changed the title Cleanup and pull optimizations from Efficient Sumcheck TODO 1 Apr 19, 2026
z-tech and others added 4 commits May 3, 2026 16:25
Resolutions:

- Cargo.toml: adopt main's clean 0.6 dep stack — drop the stale
  [patch.crates-io] block (z-tech/smallfp-absorb branch no longer
  exists; algebra rev 285dac2 was 0.5-era), bump dev-deps to 0.6.0
  to match. Use main's spongefish 0.7.0 + ark-codes z-tech fork
  pin. Keep cleanup's tracing/profile additions and `profile` feat.
- src/lib.rs, src/relations/{description.rs,r1cs/mod.rs,r1cs/hashchain/relation.rs},
  src/serialize.rs, src/protocol/mod.rs, src/error.rs: take
  cleanup's structural versions (phase modules, AccumulatorWitness
  struct, gr1cs migration via into_inner()/get_predicate_num_constraints).
- src/crypto/merkle/blake3.rs, src/protocol/domainsep/mod.rs:
  delete (cleanup intent — replaced by ark_crypto_primitives::merkle_tree::configs
  and protocol/transcript/{prover,verifier}.rs respectively).
- src/utils/poly.rs: keep (still used by protocol/phases/batching.rs).
- src/lib.rs: rewrite two `sumcheck_verify` call sites to the
  5-arg API used by effsc main (returns SumcheckResult{challenges,
  final_claim} — caller does the oracle check externally).
- src/protocol/phases/twin_constraint.rs: handle effsc main's
  `coefficient_lsb::final_value` calling convention — odd halves
  arrive empty in the singleton case. Compute h(singleton) directly
  (MLE of u at α_singleton + bundled R1CS at z, β_singleton, scaled
  by τ_singleton) and emit `[h, -h]` so g(0)+g(1) == h.

cargo test: 33 passed (22 unit + 2 integration + 9 verifier-negative).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/protocol/phases/mod.rs
verifier_state: &mut VerifierState<'a>,
statement: &Self::Statement,
inputs: Self::VerifierInputs,
) -> Result<(Self::ReducedStatement, Self::VerifierOutputs), VerifierError>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that there the prover and verifier code has a lot of common (at least for arguments), particularly for computing the next statement given a transcript. Maybe the function (statement, transcript) -> reduced_statement is something we want to expose in the trait to make any implementation less likely to have different implementations of that part between prover and verifier? I could see implementors start by copy-pasting the code from prover to verifier, then patching the code in the prover but forgetting to do so in the verifier, for example. Having a single place where this reduction happens would help prevent this failure mode.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think argus has a similar "issue", so anything we come up with here should be ported to argus too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'll give a draft I think this is very useful thanks!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type ReducedStatement;
/// Oracles emitted by this IOR (prover view: full data, plus any private
/// reduced witness state).
type ProverOutputs;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this into ProofString and ReducedWitness, since I expect that implementers will always have to split those two (the proof string is always sent to the verifier, the reduced witness is either fed into the next IOR or sent to the verifier).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&self,
verifier_state: &mut VerifierState<'a>,
statement: &Self::Statement,
inputs: Self::VerifierInputs,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by what inputs is supposed to be. It should express oracle access to previous proof strings, plus the one for the current reduction, right?

I'm wondering if we can express this oracle access as a tuple of partial functions fn: usize -> Option<Alphabet> that the verifier can query without reading (or seeing!) the whole proof string. This might also help for iBCS (where, essentially, the ARG wrapper around an IOP could implement such a partial function but additionally enforce that VC proofs must pass for the result to be Some(Alphabet)).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also yes, I'll make a draft at this and we can see how it looks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-tech
Copy link
Copy Markdown
Collaborator Author

z-tech commented May 4, 2026

Keeping this for comments for a few days but want to move to this branch where ark-vc + ark-mt are integrated: #25

@z-tech z-tech changed the title TODO 1 IOR trait May 4, 2026
@z-tech
Copy link
Copy Markdown
Collaborator Author

z-tech commented May 5, 2026

Let's try to move over to #25

where ark-vc + ark-mt are integrated as well as the suggestions above.

@z-tech z-tech closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants