Avoid unnecessary DisjunctionMaxBulkScorer overhead#15659
Avoid unnecessary DisjunctionMaxBulkScorer overhead#15659shimpeko wants to merge 6 commits intoapache:mainfrom
Conversation
56cf380 to
ecc05df
Compare
|
I realized that we cannot call ScorerSupplier.get() more than once. Need another solutions. |
a3b772a to
160a235
Compare
This change inspects clause bulk scorers up front and only uses DisjunctionMaxBulkScorer if at least one clause provides a non-default BulkScorer. Otherwise, we fall back to the scorer-based path. c88f933 made DisjunctionMaxQuery use DisjunctionMaxBulkScorer when tieBreakerMultiplier == 0 and scoreMode == TOP_SCORES. However, this bulk path primarily pays off when at least one clause implements a specialized BulkScorer. When all clauses return DefaultBulkScorer, the bulk windowing and replay logic adds overhead while preventing effective use of minCompetitiveScore and block-max optimizations that the scorer-based DisjunctionMaxScorer supports in TOP_SCORES mode. In such cases, falling back to the scorer-based path typically results in better performance and restores competitive-score-based skipping. Fixes: apache#15658 Related PR: apache#14040
160a235 to
6231d37
Compare
I updated the code not to call ScorerSupplier.get() more than once. |
|
Hi, it would be nice to not always force push, this makes reviewing hard. I have no idea if you have fixed the linter warnings. Squashing the commits is done on our side while merging, please don't do it yourself. Otherwise the change looks fine to me. but I leave it to @jpountz as he understand the code much better. |
|
Please fix the linter warning by running "./gradlew tidy" and commit, but don't sqash! |
It's true that the windowing+replay logic isn't free but I remember that it was still better than using a Let's benchmark this change with luceneutil and our dismax tasks (https://github.com/mikemccand/luceneutil/blob/main/tasks/wikinightly.tasks#L257-L276)? |
|
I tried to run wikinightly but it didn't finish after 10 hours on my laptop so just picked some dismax tasks. Looking at the result, this change (edit: This PR) basically has no effect on performance of those dismax tasks including I'll dig around deeper. Commands to run bench and task detail |
|
cont. #15659 (comment)
|
|
I added test case that are closer to my query which has dismax + constant_score as https://github.com/shimpeko/luceneutil/pull/1/changes. Looking at the following benchmark result, I think I can say that the changes on this PR has significant positive impact on the performance of specifc type of query.
This seems true, so I don't cleary understand why using DisjunctionBulkMaxScorer causing regression in this paticular case, yet. @jpountz do you have any idea. Edit: I'll look deeper into DisjunctionBulkMaxScorer but happy if I could get feedbacks/suggestions, for this PR after having the benchmark result. Result (50 warmups, 50 iter) Commands to run bench |
This reverts commit 68ada56.
This reverts commit ccf5143.
The previous DisjunctionMaxBulkScorer uses a fixed 4096-doc window. minCompetitiveScore is only propagated to sub-scorers at window boundaries, so up to 4096 docs may be scored before a new threshold reaches them. For specific types of queries that can eliminate sub-scorers earlier than 4096 evaluation, DisjunctionMaxBulkScorer is slower than the non-bulk DisjunctionMaxScorer. This change update DisjunctionMaxBulkScorer to start with window size 1 and double each iteration, capping at 4096. Small initial windows let minCompetitiveScore propagate frequently while the threshold is rising and sub-scorers are being eliminated. Expected to help most when TOP_SCORES has low N, the query has high match rate, and the disjunction has many clauses. The 13 sub-4096 windows add minor per-window overhead for queries where full-size windows would have been fine.
|
@jpountz I’ve dropped the idea of switching from Instead, this change (fe86ceb) now makes As you noted in #15659 (comment), “DisjunctionMaxBulkScorer tracks the min competitive score and passes it to its sub clauses”. However, propagating the min competitive score only at fixed 4096-doc boundaries is sometimes too slow for queries such as "constant_score + top-N (with small N) + high match rate". In these cases, the delayed propagation causes a regression compared to The new approach addresses that problem by exponentially increasing the window size from 1 up to 4096, ensuring that min competitive score propagation happens early enough, while still preserving the throughput advantages of bulk scoring. I believe this approach is more robust than relying on scorer-type checks such as Below are the benchmark result, which demonstrates the improvement: baseline: 16f83c9 As mentioned in #15659 (comment), I added test cases that exercise a dismax + constant_score structure in luceneutil: https://github.com/shimpeko/luceneutil/pull/1/changes. |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
|
I really like the simplicity of the change, and it gives obvious improvements with very little detriment. @shimpeko do you mind moving your "CHANGES" entry to 10.5? Once you do, I think we can move towards merging this unless @uschindler or @jpountz objects. |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
This change inspects clause bulk scorers up front and only uses DisjunctionMaxBulkScorer if at least one clause provides a non-default BulkScorer. Otherwise, we fall back to the scorer-based path.
c88f933 made DisjunctionMaxQuery use DisjunctionMaxBulkScorer when tieBreakerMultiplier == 0 and scoreMode == TOP_SCORES. However, this bulk path primarily pays off when at least one clause implements a specialized BulkScorer.
When all clauses return DefaultBulkScorer, the bulk windowing and replay logic adds overhead while preventing effective use of minCompetitiveScore and block-max optimizations that the scorer-based DisjunctionMaxScorer supports in TOP_SCORES mode.
In such cases, falling back to the scorer-based path typically results in better performance and restores competitive-score-based skipping.
Fixes: #15658
Related PR: #14040