feat: isolate Bernstein-Yang impl from Pippenger rewrite#23426
Conversation
7e82b3c to
a5bb2e1
Compare
Adds a variable-time safegcd inverse (Bernstein-Yang '19) for 254-bit prime fields and wires field::invert() to dispatch to it at runtime, keeping Fermat for constexpr contexts and 256-bit moduli (secp256k1/r1). Includes the WASM 9x29 kernel, a differential fuzzer vs Fermat, and unit tests exercising the WASM kernel on x86_64. Extracted from 758407a without the surrounding Pippenger refactor.
a5bb2e1 to
10dc87e
Compare
BrowserStack wasm-bench A/B — PR vs
|
| Target | N | PR median ms | Base median ms | Δ median % | 95% CI % | Distinguishable from 0? |
|---|---|---|---|---|---|---|
| Pixel 9 Pro XL · Chrome | 10 | 23 276 | 24 363 | −5.13% | [−7.46, −1.53] | ✓ |
| iPhone 15 Pro · Safari † | 6 | 18 907 | 19 728 | −6.18% | [−8.07, −1.51] | ✓ |
| macOS Sequoia · Chrome 148 | 10 | 11 009 | 11 484 | −3.59% | [−8.20, −2.38] | ✓ |
| Galaxy S25 Ultra · Chrome | 10 | 10 182 | 10 636 | −3.14% | [−10.29, +3.93] | ✗ |
Three of four targets clear zero with a bootstrap 95% CI on the paired median Δ% (2000 resamples). Real 3–6% Chonk end-to-end speedup from the Bernstein-Yang inverse on the targets that resolved. Galaxy point estimate is in the same direction; the Snapdragon 8 Elite scheduler is noisy enough that the CI straddles zero at N=10, so I'm not claiming a sign — bumping N to ~25 would likely resolve it.
All 56 proofs verified (proofFieldCount=2630, verificationKeyBytes=4576). crossOriginIsolated=true, sharedArrayBuffer=true on every run.
Adds a variable-time safegcd inverse (Bernstein-Yang '19) for 254-bit prime fields and wires field::invert() to dispatch to it at runtime, keeping Fermat for constexpr contexts and 256-bit moduli (secp256k1/r1). Includes the WASM 9x29 kernel, a differential fuzzer vs Fermat, and unit tests exercising the WASM kernel on x86_64.
Extracted from 758407a without the surrounding Pippenger refactor.