Skip to content

crypto: Share Fq12 squaring across pairs in BN254 Miller loop#1544

Draft
chfast wants to merge 1 commit into
masterfrom
crypto/multi-pair-miller
Draft

crypto: Share Fq12 squaring across pairs in BN254 Miller loop#1544
chfast wants to merge 1 commit into
masterfrom
crypto/multi-pair-miller

Conversation

@chfast
Copy link
Copy Markdown
Member

@chfast chfast commented May 26, 2026

pairing_check() previously ran one independent Miller loop per pair and multiplied the results, paying LOG_ATE_LOOP_COUNT + 1 = 64 Fq12 squarings per pair. Because the Miller-loop recurrence

f_{i+1} = f_i² · line_i

is multiplicative across pairs, all pairs can share a single Fq12 accumulator. For N valid pairs this saves (N−1) × 64 Fq12 squarings without changing the final product.

Restructure as multi_miller_loop():

  • Validate all pairs up front, collect surviving ones into a MillerPairState vector (T in Jacobian, Q/-Q affine, P, -P.y).
  • Single squaring per iteration, then line-and-mul for every pair.
  • NAF add branch and the two post-loop Frobenius steps iterate over every pair.

The bench inputs span 2 and 4 pairs per call — for the 10 inputs in test/precompiles_bench/precompiles_bench.cpp, total Fq12 squarings drop from 2048 (= 32 pairs × 64) to 640 (= 10 calls × 64), saving ~141 squarings per call on average.

Bench (build/clang-tt, 100 reps × 1s each, ecpairing precompile):
master baseline: 2906753 ns mean, 2888808 ns median, σ=185869
this branch: 2909897 ns mean, 2890350 ns median, σ=232121
Δ: within noise (σ ≈ 7% of mean) — measurable savings on the
algorithm side don't translate to a wall-clock win on this build/
CPU. Fq12 squaring is apparently cheap enough relative to the
per-pair line work that the per-call sharing benefit is below the
current noise floor.

Code-quality / structural improvement on its own: the multi-pair loop form is simpler (single accumulator, single squaring path), and ports cleanly to the planned follow-up Karatsuba-sparse line multiplication which can share more work across pairs.

Tests: 53/53 unit tests, EEST state tests 11/11 on every stable fork (Byzantium / Istanbul / Cancun / Prague / Osaka).

pairing_check() previously ran one independent Miller loop per pair
and multiplied the results, paying LOG_ATE_LOOP_COUNT + 1 = 64 Fq12
squarings per pair. Because the Miller-loop recurrence

    f_{i+1} = f_i² · line_i

is multiplicative across pairs, all pairs can share a single Fq12
accumulator. For N valid pairs this saves (N−1) × 64 Fq12 squarings
without changing the final product.

Restructure as multi_miller_loop():
  - Validate all pairs up front, collect surviving ones into a
    MillerPairState vector (T in Jacobian, Q/-Q affine, P, -P.y).
  - Single squaring per iteration, then line-and-mul for every pair.
  - NAF add branch and the two post-loop Frobenius steps iterate
    over every pair.

The bench inputs span 2 and 4 pairs per call — for the 10 inputs in
test/precompiles_bench/precompiles_bench.cpp, total Fq12 squarings
drop from 2048 (= 32 pairs × 64) to 640 (= 10 calls × 64), saving
~141 squarings per call on average.

Bench (build/clang-tt, 100 reps × 1s each, ecpairing precompile):
  master baseline:  2906753 ns mean, 2888808 ns median, σ=185869
  this branch:      2909897 ns mean, 2890350 ns median, σ=232121
  Δ: within noise (σ ≈ 7% of mean) — measurable savings on the
  algorithm side don't translate to a wall-clock win on this build/
  CPU. Fq12 squaring is apparently cheap enough relative to the
  per-pair line work that the per-call sharing benefit is below the
  current noise floor.

Code-quality / structural improvement on its own: the multi-pair loop
form is simpler (single accumulator, single squaring path), and ports
cleanly to the planned follow-up Karatsuba-sparse line multiplication
which can share more work across pairs.

Tests: 53/53 unit tests, EEST state tests 11/11 on every stable fork
(Byzantium / Istanbul / Cancun / Prague / Osaka).
@chfast chfast requested a review from rodiazet May 26, 2026 08:46
@chfast
Copy link
Copy Markdown
Member Author

chfast commented May 26, 2026

                                                   │ /proc/self/fd/11 │           /proc/self/fd/16           │
                                                   │      gas/s       │    gas/s      vs base                │
precompile<PrecompileId::ecpairing,_evmmax_cpp>-14        92.73M ± 0%   109.52M ± 0%  +18.11% (p=0.000 n=11)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.97%. Comparing base (1dc88fc) to head (e4da72a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1544   +/-   ##
=======================================
  Coverage   96.97%   96.97%           
=======================================
  Files         163      163           
  Lines       14455    14460    +5     
  Branches     3385     3390    +5     
=======================================
+ Hits        14018    14023    +5     
  Misses        307      307           
  Partials      130      130           
Flag Coverage Δ
eest-develop 91.95% <100.00%> (+<0.01%) ⬆️
eest-develop-gmp 26.56% <100.00%> (+0.02%) ⬆️
eest-legacy 17.55% <0.00%> (-0.01%) ⬇️
eest-libsecp256k1 28.20% <100.00%> (+0.02%) ⬆️
eest-stable 91.86% <100.00%> (+<0.01%) ⬆️
evmone-unittests 92.64% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
core 96.02% <100.00%> (+<0.01%) ⬆️
tooling 86.71% <ø> (ø)
tests 99.79% <ø> (ø)
Files with missing lines Coverage Δ
lib/evmone_precompiles/pairing/bn254/pairing.cpp 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant