crypto: Share Fq12 squaring across pairs in BN254 Miller loop#1544
Draft
chfast wants to merge 1 commit into
Draft
crypto: Share Fq12 squaring across pairs in BN254 Miller loop#1544chfast wants to merge 1 commit into
chfast wants to merge 1 commit into
Conversation
pairing_check() previously ran one independent Miller loop per pair
and multiplied the results, paying LOG_ATE_LOOP_COUNT + 1 = 64 Fq12
squarings per pair. Because the Miller-loop recurrence
f_{i+1} = f_i² · line_i
is multiplicative across pairs, all pairs can share a single Fq12
accumulator. For N valid pairs this saves (N−1) × 64 Fq12 squarings
without changing the final product.
Restructure as multi_miller_loop():
- Validate all pairs up front, collect surviving ones into a
MillerPairState vector (T in Jacobian, Q/-Q affine, P, -P.y).
- Single squaring per iteration, then line-and-mul for every pair.
- NAF add branch and the two post-loop Frobenius steps iterate
over every pair.
The bench inputs span 2 and 4 pairs per call — for the 10 inputs in
test/precompiles_bench/precompiles_bench.cpp, total Fq12 squarings
drop from 2048 (= 32 pairs × 64) to 640 (= 10 calls × 64), saving
~141 squarings per call on average.
Bench (build/clang-tt, 100 reps × 1s each, ecpairing precompile):
master baseline: 2906753 ns mean, 2888808 ns median, σ=185869
this branch: 2909897 ns mean, 2890350 ns median, σ=232121
Δ: within noise (σ ≈ 7% of mean) — measurable savings on the
algorithm side don't translate to a wall-clock win on this build/
CPU. Fq12 squaring is apparently cheap enough relative to the
per-pair line work that the per-call sharing benefit is below the
current noise floor.
Code-quality / structural improvement on its own: the multi-pair loop
form is simpler (single accumulator, single squaring path), and ports
cleanly to the planned follow-up Karatsuba-sparse line multiplication
which can share more work across pairs.
Tests: 53/53 unit tests, EEST state tests 11/11 on every stable fork
(Byzantium / Istanbul / Cancun / Prague / Osaka).
Member
Author
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1544 +/- ##
=======================================
Coverage 96.97% 96.97%
=======================================
Files 163 163
Lines 14455 14460 +5
Branches 3385 3390 +5
=======================================
+ Hits 14018 14023 +5
Misses 307 307
Partials 130 130
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pairing_check() previously ran one independent Miller loop per pair and multiplied the results, paying LOG_ATE_LOOP_COUNT + 1 = 64 Fq12 squarings per pair. Because the Miller-loop recurrence
is multiplicative across pairs, all pairs can share a single Fq12 accumulator. For N valid pairs this saves (N−1) × 64 Fq12 squarings without changing the final product.
Restructure as multi_miller_loop():
The bench inputs span 2 and 4 pairs per call — for the 10 inputs in test/precompiles_bench/precompiles_bench.cpp, total Fq12 squarings drop from 2048 (= 32 pairs × 64) to 640 (= 10 calls × 64), saving ~141 squarings per call on average.
Bench (build/clang-tt, 100 reps × 1s each, ecpairing precompile):
master baseline: 2906753 ns mean, 2888808 ns median, σ=185869
this branch: 2909897 ns mean, 2890350 ns median, σ=232121
Δ: within noise (σ ≈ 7% of mean) — measurable savings on the
algorithm side don't translate to a wall-clock win on this build/
CPU. Fq12 squaring is apparently cheap enough relative to the
per-pair line work that the per-call sharing benefit is below the
current noise floor.
Code-quality / structural improvement on its own: the multi-pair loop form is simpler (single accumulator, single squaring path), and ports cleanly to the planned follow-up Karatsuba-sparse line multiplication which can share more work across pairs.
Tests: 53/53 unit tests, EEST state tests 11/11 on every stable fork (Byzantium / Istanbul / Cancun / Prague / Osaka).