Fall back to raw-value REGEXP_LIKE evaluator when no dict-consuming index is available#18599
Open
deepthi912 wants to merge 8 commits into
Open
Fall back to raw-value REGEXP_LIKE evaluator when no dict-consuming index is available#18599deepthi912 wants to merge 8 commits into
deepthi912 wants to merge 8 commits into
Conversation
…ndex is available
When FST/IFST exists but the column has no sorted/inverted index that can consume a
dict-id-based predicate evaluator, FilterPlanNode previously built the FST/IFST
evaluator unconditionally. With a RAW forward index, FilterOperatorUtils then fell
through to ScanBasedFilterOperator, which calls applySV(String) on the dict-id
evaluator — that throws UnsupportedOperationException
(BaseDictionaryBasedPredicateEvaluator), crashing queries such as
`regexp_like(col, 'pat', 'i')` and `LIKE 'pat'` on external/iceberg-backed tables
with `encodingType: RAW` + `dictionary: {}` + `ifst: { enabled: true }`.
Add canConsumeDictIdEvaluator() — only construct the FST/IFST dict-id evaluator
when a sorted or inverted index is available for this data source (matching the
operator-routing logic in FilterOperatorUtils#getLeafFilterOperator). Otherwise
fall through to PredicateEvaluatorProvider, which returns
RawValueBasedRegexpLikePredicateEvaluator — already implements applySV(String)
correctly. No changes to base classes or scan iterator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… selection - FilterPlanNode: hoist getDictionaryUsableForFiltering call into a local `dictUsable` variable so both case-insensitive/case-sensitive branches stay under the 120-char line limit (and the dictionary check runs once instead of twice per predicate). - FilterPlanNodeTest: add 5 tests covering the regex evaluator-selection logic: - IFST + dict + inverted (RAW forward) → dict-id evaluator (IFST) - IFST + dict + no inverted (RAW forward) → raw-value evaluator (the bug) - FST + dict + inverted (RAW forward) → dict-id evaluator (FST) - FST + dict + no inverted (RAW forward) → raw-value evaluator - IFST + dict + dict-encoded forward → dict-id evaluator (scan w/ DictIdMatcher) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mockTextIndexReader() internally calls Mockito.when(...).thenReturn(...). Invoking it as an argument inside an outer Mockito.when(...).thenReturn(...) chain confuses Mockito's pending-stubbing tracker and surfaces as UnfinishedStubbing failures on all 5 new tests. Build the inner mocks into locals first, then pass to the outer when().thenReturn() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidates all REGEXP_LIKE evaluator-selection logic in PredicateEvaluatorProvider so FilterPlanNode just calls the standard getPredicateEvaluator(predicate, dataSource, queryContext). The dict-based switch's REGEXP_LIKE case prefers the FST/IFST text index when present on the data source, otherwise falls back to the existing RegexpLikePredicateEvaluatorFactory.newDictionaryBasedEvaluator. No evaluator is built and discarded — the upgrade decision happens before any construction. - buildEvaluator gains a @nullable DataSource parameter; the Dictionary-based public overload passes null (no DataSource to read text indexes from). - FilterPlanNode REGEXP_LIKE case collapses from 25 lines to 5 and drops three imports (RegexpLikePredicate, FSTBasedRegexpPredicateEvaluatorFactory, IFSTBasedRegexpPredicateEvaluatorFactory). - getDictionaryUsableForFiltering reverts to package-private — no external caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18599 +/- ##
=============================================
- Coverage 64.28% 36.81% -27.47%
+ Complexity 1137 1136 -1
=============================================
Files 3335 3335
Lines 205898 206038 +140
Branches 32129 32142 +13
=============================================
- Hits 132355 75859 -56496
- Misses 62894 123315 +60421
+ Partials 10649 6864 -3785
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RawValueBasedRegexpLikePredicateEvaluatorwithout throwing following exception: