Deferring lambda in TermStates.java according to prefetch#15627
Deferring lambda in TermStates.java according to prefetch#15627msokolov merged 11 commits intoapache:mainfrom
Conversation
…boolean and propagate results to TermStates to remove lambda for hot index optimization. Issue: apache#15515
|
+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶? I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries? |
# Conflicts: # lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java
Hmm I have tried with even lower memory (4 GB) with same results.
Makes sense! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
|
We merged a PR In luceneutil which adds the task to measure perf improvement from this PR. |
|
We have the first data point from running AndMissingHigh in nightlies here:
This would be a good baseline to judge when this change is merged! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
|
This looks ready to me -- do you have any concerns, @uschindler ? |
|
@shubhamsrkdev do you think this can be backported to 10.x? I'm not sure if the prefetch APIs there are compatible? |
|
Hi the merging to main branch caused build failures because there were new tests which did not adapt to new method signature. This can't be backported to 10.x because it has a significant API change in a public and often used API (prefetech). |
uschindler
left a comment
There was a problem hiding this comment.
There is also an API problem in the latest commits. The method should not be here. Please revert this change to IOBooleanSupplier.
| */ | ||
| boolean get() throws IOException; | ||
|
|
||
| /** |
There was a problem hiding this comment.
Why is this method in IOBooleanSupplier. It should not be here, that has nothing to do with this interface!
There was a problem hiding this comment.
If we need something like this, it should be a separate class and not reusing this interface for something totally unrelated.
There was a problem hiding this comment.
Thanks @uschindler let me create another PR to remove this.
There was a problem hiding this comment.
Thinking about this more, if we create a separate class we will end up replacing IOBooleanSupplier everywhere with this new class and IOBooleanSupplier wouldn't have any usage in lucene's code (I understand it's a public interface so we will still keep it)?
|
I filed #15909 to figure out why the checks passed but jenkins discovered a problem. |
|
The reason for the failure after merge is quite simple: the branch here was created long ago and was not merged up to main recently. The method failing was added in another PR two weeks ago. |
Hmm this is a frustrating blind spot. GH actions passed on old branch. Main branch keeps moving forward. Then this (old) branch is merge/squashed with a simple rebase onto main that didn't produce source-level conflicts so GH/git is happy, and it's pushed, causing a compilation error ... so it is just up to whoever merges to remember to first merge main if branch is old, wait for GH actions to be happy, then merge? |


Problem
Currently, for term lookup
TermStates.getdefers for every case (to the lambda in TermStates.java). This works great for cold indexes but might not be the best for hot indexes.Solution
We can depend upon prefetch for deferring. If prefetch is being done then index is still cold hence we should defer termstate lookup through lambda, if not we should short-circuit and look up termstate directly instead of deferring.
Testing
We added this new task to demonstrate that this changes are beneficial for cases when one term is missing(
jasdgiasgdiuygduasgd) and the other term is present in high frequency(ring) and enabled sort order (Thanks @epotyom !):AndMissingHigh: titledvsort//+jasdgiasgdiuygduasgd +ringThe task shows good improvements in QPS (+31.8%), rest of the tasks are stable:
I have also tried simulating a cold index by using https://github.com/mikemccand/luceneutil/blob/main/src/python/ramhog.c .
The results look fine :