Fall back to raw-value REGEXP_LIKE evaluator when no dict-consuming index is available by deepthi912 · Pull Request #18599 · apache/pinot

deepthi912 · 2026-05-27T19:53:30Z

When inverted index or range index or dictionary encoding is not enabled, we need to fallback to using RawValueBasedRegexpLikePredicateEvaluator without throwing following exception:

2026/05/20 12:58:09.599 ERROR [BaseCombineOperator] [pqw-6] Caught exception while processing query: QueryContext{_tableName='parquet_datatype_logical_types_OFFLINE', _subquery=null, _selectExpressions=[count(*)], _distinct=false, _aliasList=[null], _filter=regexp_like(col_string,'abc','i'), _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={useMultistageEngine=false, serverReturnFinalResult=true, timeoutMs=60000}, _expressionOverrideHints={}, _explain=NONE}
org.apache.pinot.spi.exception.QueryException: Caught exception while doing operator: class org.apache.pinot.core.operator.AcquireReleaseColumnsSegmentOperator on segment 37aa1ac19d2979ca369ed42b42187063: null
	at org.apache.pinot.spi.exception.QueryErrorCode.asException(QueryErrorCode.java:171)
	at org.apache.pinot.core.operator.combine.BaseCombineOperator.wrapOperatorException(BaseCombineOperator.java:307)
	at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.processSegments(BaseSingleBlockCombineOperator.java:84)
	at org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:218)
	at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)
	at org.apache.pinot.spi.query.QueryThreadContext$1.lambda$decorate$1(QueryThreadContext.java:273)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.UnsupportedOperationException
	at org.apache.pinot.core.operator.filter.predicate.BaseDictionaryBasedPredicateEvaluator.applySV(BaseDictionaryBasedPredicateEvaluator.java:133)
	at org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator$StringMatcher.doesValueMatch(SVScanDocIdIterator.java:308)
	at org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator$ValueMatcher.matchValues(SVScanDocIdIterator.java:208)
	at org.apache.pinot.core.operator.dociditerators.SVScanDocIdIterator.next(SVScanDocIdIterator.java:86)
	at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:76)
	at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:40)
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42)
	at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:87)
	at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:39)
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42)
	at org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:73)
	at org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:43)
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42)
	at org.apache.pinot.core.operator.AcquireReleaseColumnsSegmentOperator.getNextBlock(AcquireReleaseColumnsSegmentOperator.java:74)
	at org.apache.pinot.core.operator.AcquireReleaseColumnsSegmentOperator.getNextBlock(AcquireReleaseColumnsSegmentOperator.java:43)
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42)
	at org.apache.pinot.core.operator.combine.BaseSingleBlockCombineOperator.processSegments(BaseSingleBlockCombineOperator.java:82)
	... 12 more

…ndex is available When FST/IFST exists but the column has no sorted/inverted index that can consume a dict-id-based predicate evaluator, FilterPlanNode previously built the FST/IFST evaluator unconditionally. With a RAW forward index, FilterOperatorUtils then fell through to ScanBasedFilterOperator, which calls applySV(String) on the dict-id evaluator — that throws UnsupportedOperationException (BaseDictionaryBasedPredicateEvaluator), crashing queries such as `regexp_like(col, 'pat', 'i')` and `LIKE 'pat'` on external/iceberg-backed tables with `encodingType: RAW` + `dictionary: {}` + `ifst: { enabled: true }`. Add canConsumeDictIdEvaluator() — only construct the FST/IFST dict-id evaluator when a sorted or inverted index is available for this data source (matching the operator-routing logic in FilterOperatorUtils#getLeafFilterOperator). Otherwise fall through to PredicateEvaluatorProvider, which returns RawValueBasedRegexpLikePredicateEvaluator — already implements applySV(String) correctly. No changes to base classes or scan iterator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… selection - FilterPlanNode: hoist getDictionaryUsableForFiltering call into a local `dictUsable` variable so both case-insensitive/case-sensitive branches stay under the 120-char line limit (and the dictionary check runs once instead of twice per predicate). - FilterPlanNodeTest: add 5 tests covering the regex evaluator-selection logic: - IFST + dict + inverted (RAW forward) → dict-id evaluator (IFST) - IFST + dict + no inverted (RAW forward) → raw-value evaluator (the bug) - FST + dict + inverted (RAW forward) → dict-id evaluator (FST) - FST + dict + no inverted (RAW forward) → raw-value evaluator - IFST + dict + dict-encoded forward → dict-id evaluator (scan w/ DictIdMatcher) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mockTextIndexReader() internally calls Mockito.when(...).thenReturn(...). Invoking it as an argument inside an outer Mockito.when(...).thenReturn(...) chain confuses Mockito's pending-stubbing tracker and surfaces as UnfinishedStubbing failures on all 5 new tests. Build the inner mocks into locals first, then pass to the outer when().thenReturn() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Consolidates all REGEXP_LIKE evaluator-selection logic in PredicateEvaluatorProvider so FilterPlanNode just calls the standard getPredicateEvaluator(predicate, dataSource, queryContext). The dict-based switch's REGEXP_LIKE case prefers the FST/IFST text index when present on the data source, otherwise falls back to the existing RegexpLikePredicateEvaluatorFactory.newDictionaryBasedEvaluator. No evaluator is built and discarded — the upgrade decision happens before any construction. - buildEvaluator gains a @nullable DataSource parameter; the Dictionary-based public overload passes null (no DataSource to read text indexes from). - FilterPlanNode REGEXP_LIKE case collapses from 25 lines to 5 and drops three imports (RegexpLikePredicate, FSTBasedRegexpPredicateEvaluatorFactory, IFSTBasedRegexpPredicateEvaluatorFactory). - getDictionaryUsableForFiltering reverts to package-private — no external caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-05-27T21:33:16Z

Codecov Report

❌ Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.81%. Comparing base (baccdcc) to head (0b30fd3).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
...r/filter/predicate/PredicateEvaluatorProvider.java	0.00%	11 Missing ⚠️
...ava/org/apache/pinot/core/plan/FilterPlanNode.java	0.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (baccdcc) and HEAD (0b30fd3). Click for more details.

HEAD has 4 uploads less than BASE

Flag BASE (baccdcc) HEAD (0b30fd3)

java-21 5 4

unittests1 1 0

unittests 2 1

temurin 5 4

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #18599       +/-   ##
=============================================
- Coverage     64.28%   36.81%   -27.47%     
+ Complexity     1137     1136        -1     
=============================================
  Files          3335     3335               
  Lines        205898   206038      +140     
  Branches      32129    32142       +13     
=============================================
- Hits         132355    75859    -56496     
- Misses        62894   123315    +60421     
+ Partials      10649     6864     -3785

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-21	`36.81% <0.00%> (-27.47%)`	⬇️
temurin	`36.81% <0.00%> (-27.47%)`	⬇️
unittests	`36.81% <0.00%> (-27.47%)`	⬇️
unittests1	`?`
unittests2	`36.81% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

deepthi912 and others added 7 commits May 27, 2026 12:51

Update the comment

a0bbdf4

Remove the comments

ddb668a

Reuse the existing method

62cf647

Remove comments from new REGEXP_LIKE evaluator-selection tests

d086077

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

deepthi912 added index Related to indexing (general) text-search Related to text/Lucene indexing and search labels May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall back to raw-value REGEXP_LIKE evaluator when no dict-consuming index is available#18599

Fall back to raw-value REGEXP_LIKE evaluator when no dict-consuming index is available#18599
deepthi912 wants to merge 8 commits into
apache:masterfrom
deepthi912:deepthi/approach4-filter-plan-node

deepthi912 commented May 27, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deepthi912 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deepthi912 commented May 27, 2026 •

edited

Loading

codecov-commenter commented May 27, 2026 •

edited

Loading