Skip to content

Avoid materializing the full bitmap for count aggregation with filter on column with inverted index#18543

Open
siddharthteotia wants to merge 1 commit into
apache:masterfrom
siddharthteotia:perf/iif-getnummatchingdocs-bench
Open

Avoid materializing the full bitmap for count aggregation with filter on column with inverted index#18543
siddharthteotia wants to merge 1 commit into
apache:masterfrom
siddharthteotia:perf/iif-getnummatchingdocs-bench

Conversation

@siddharthteotia
Copy link
Copy Markdown
Contributor

@siddharthteotia siddharthteotia commented May 20, 2026

Improvement Summary

  • Optimizes InvertedIndexFilterOperator.getNumMatchingDocs() for single-value columns by replacing the materialized-union approach with a sum of per-dictId bitmap cardinalitie
  • Eliminates MutableRoaringBitmap allocation on the hot count path and yields speedups
  • Addresses an old TODO in the code referring to this optimization.
  • Multi-value path is unchanged in this PR — an approach was prototyped and benchmarked but regressed at the edges. Will do a follow-up PR for thisthreshold-aware MV optimization will follow in a separate PR.

Functional Testing

  • Added new unit test focused on specific code paths for 100% coverage
  • All existing SV and MV query tests that exercise this code path are passing. No new e2e query tests should be needed
  • FastFilteredCountTest + FastFilteredCountMCTest together run 100 query cases hitting FastFilteredCountOperator → filterOperator.getNumMatchingDocs() on SV-indexed columns — all pass.

Performance Testing

See BenchmarkInvertedIndexGetNumMatchingDocs

  • JMH, numDocs=5M, Used -prof GC option with JMH
  • JDK 21, macOS Apple Silicon (re ran on x86 as well to cross check)
  • Dictionary cardinality ∈ {1024, 100K, 1M}
  • NumMatcingDictIDs (K) ∈ {4, 16, 64, 256, 1K, 10K, 100K}

SV Results

dict cardinality = 1K

K current new speedup current alloc new alloc
4 44.71 µs 90 ns 499x 137.4 KB 0 B
16 738 µs 530 ns 1392x 565.5 KB 0 B
64 8.39 ms 4.12 µs 2035x 3.36 MB 0 B
256 9.65 ms 16.57 µs 582x 3.41 MB 0 B
1,000 14.10 ms 66.77 µs 211x 3.41 MB 0 B

dict cardinality = 100K

K current new speedup current alloc new alloc
4 3.51 µs 63 ns 56x 13.1 KB 0 B
16 19.53 µs 339 ns 58x 33.1 KB 0 B
64 103.73 µs 3.05 µs 34x 63.4 KB 0 B
256 1.20 ms 14.39 µs 84x 183.0 KB 0 B
1,000 11.77 ms 64.41 µs 183x 547.9 KB 0 B
10,000 223.15 ms 3.50 ms 64x 3.82 MB 0 B
100,000 428.49 ms 36.62 ms 12x 3.85 MB 0 B

dict cardinality = 1M

K current new speedup current alloc new alloc
4 1.39 µs 25 ns 55x 5.3 KB 0 B
16 6.17 µs 120 ns 51x 12.8 KB 0 B
64 23.08 µs 618 ns 37x 30.7 KB 0 B
256 132.59 µs 3.81 µs 35x 53.8 KB 0 B
1,000 996.64 µs 24.31 µs 41x 137.8 KB 0 B
10,000 44.85 ms 260.38 µs 172x 1016.4 KB 0 B
100,000 342.36 ms 9.36 ms 37x 4.00 MB 32 B

Note

  • Allocation is reported per getNumMatchingDocs() call (one call per segment hit by the filter). A typical query fans out multiple segments per server.
  • E.g: 3.36MB for a query touching 100 segments at 1000QPS is 336GB/sec of young gen pressure.

MV Results (MV Implementation not changed in this PR but bench test has MV POC code)

  • Tried with . BufferFastAggregation.orCardinality
  • This also allocates internally a scratch bitmap
  • Does a two-pass scan over all input bitmaps (pass 1 collects union container keys; pass 2 ORs the matching container from each input into a reused scratch container, per key)
  • At low K with tiny per-bitmap cardinality (sparse columns) the fixed scratch cost is unamortized. At very high K with high cardinality, the per-key bitmap walk dominates.
  • So the results were mixed (upto 7x faster but also 2x slower in some cases).

Backwards Compatibility

  • Compatible

@siddharthteotia siddharthteotia added the performance Related to performance optimization label May 20, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 20, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.28%. Comparing base (2910445) to head (e1f00b8).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
...e/operator/filter/InvertedIndexFilterOperator.java 82.35% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18543      +/-   ##
============================================
+ Coverage     63.75%   64.28%   +0.53%     
+ Complexity     1932     1126     -806     
============================================
  Files          3292     3311      +19     
  Lines        201470   203788    +2318     
  Branches      31316    31721     +405     
============================================
+ Hits         128442   131001    +2559     
+ Misses        62735    62289     -446     
- Partials      10293    10498     +205     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.28% <82.35%> (+0.53%) ⬆️
temurin 64.28% <82.35%> (+0.53%) ⬆️
unittests 64.28% <82.35%> (+0.53%) ⬆️
unittests1 56.73% <82.35%> (+0.93%) ⬆️
unittests2 35.49% <0.00%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Related to performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants