Skip to content

Add pluggable filter predicate registry for custom query semantics#18129

Open
xiangfu0 wants to merge 6 commits intoapache:masterfrom
xiangfu0:pluggable-filter-registry
Open

Add pluggable filter predicate registry for custom query semantics#18129
xiangfu0 wants to merge 6 commits intoapache:masterfrom
xiangfu0:pluggable-filter-registry

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Apr 8, 2026

Summary

Introduces a plugin framework (FilterPredicatePlugin + CustomFilterOperatorFactory) that allows registering custom filter predicates (e.g., SEMANTIC_MATCH) via ServiceLoader without modifying Pinot core code.

Motivation

Adding a new filter predicate today requires touching 8 hardcoded points across 4 modules (enums, switch statements, if-else chains). This one-time change adds registry-based extension points so future predicates can be added purely via plugin JARs on the classpath.

New SPI Interfaces

FilterPredicatePlugin (pinot-common)

Handles the query parsing pipeline:

  • name() — predicate name (e.g., "SEMANTIC_MATCH")
  • validateFilterExpression() — SQL validation (CalciteSqlParser)
  • rewriteExpression() — predicate rewriting (PredicateComparisonRewriter)
  • getOperandTypes() / getOptionalOperandIndices() — Calcite operator registration
  • createPredicate() — convert parsed operands to a Predicate object

CustomFilterOperatorFactory (pinot-core)

Handles query execution:

  • predicateName() — matches the predicate name
  • createFilterOperator() — creates the filter operator for a segment
  • createFilterOperator(..., hasMetadataFilter) — metadata-filter-aware variant for AND queries

CustomPredicate (pinot-common)

Base class for plugin predicates. Returns Predicate.Type.CUSTOM and carries a customTypeName for factory lookup at execution time.

New Files

File Module Purpose
FilterPredicatePlugin.java pinot-common SPI: validation, rewriting, operand types, predicate creation
FilterPredicateRegistry.java pinot-common ServiceLoader + programmatic registration
CustomPredicate.java pinot-common Base class for plugin predicates
CustomFilterOperatorFactory.java pinot-core SPI: filter operator creation
CustomFilterOperatorRegistry.java pinot-core ServiceLoader + programmatic registration

Extension Points Modified

File Change
Predicate.Type Added CUSTOM sentinel
CalciteSqlParser.validateFilter() Registry check before default branch
PredicateComparisonRewriter Registry check before boolean-conversion fallback
RequestContextUtils.getFilterInner() Registry check before EnumUtils fallback
FilterPlanNode CUSTOM case + function-LHS handling
PinotOperatorTable registerCustomFilterPredicates() from registry
BaseBrokerStarter FilterPredicateRegistry.init() at startup
BaseServerStarter Both registries init at startup
BaseControllerStarter FilterPredicateRegistry.init() at startup

What a Plugin Looks Like

// 1. Implement FilterPredicatePlugin
public class SemanticMatchPlugin implements FilterPredicatePlugin {
    @Override public String name() { return "SEMANTIC_MATCH"; }
    @Override public List<OperandType> getOperandTypes() {
        return List.of(OperandType.STRING, OperandType.STRING, OperandType.INTEGER);
    }
    @Override public List<Integer> getOptionalOperandIndices() { return List.of(2); }
    @Override public void validateFilterExpression(List<Expression> operands) { /* ... */ }
    @Override public Predicate createPredicate(List<ExpressionContext> operands) {
        return new SemanticMatchPredicate(operands.get(0), /* ... */);
    }
}

// 2. Implement CustomFilterOperatorFactory
public class SemanticMatchOperatorFactory implements CustomFilterOperatorFactory {
    @Override public String predicateName() { return "SEMANTIC_MATCH"; }
    @Override public BaseFilterOperator createFilterOperator(
        IndexSegment segment, QueryContext ctx, Predicate pred, DataSource ds, int numDocs) {
        return new SemanticMatchFilterOperator(/* ... */);
    }
}

// 3. Register via META-INF/services (ServiceLoader auto-discovery)

Performance Impact

Zero for existing built-in predicates. Custom predicates add one ConcurrentHashMap.get() per query (planning) and per segment (execution). Per-row filter execution has zero overhead.

Test plan

  • All modified modules compile cleanly
  • Checkstyle and license checks pass
  • Integration test with a sample custom predicate plugin

Follow-up PR will extract VECTOR_SIMILARITY into pinot-plugins/pinot-vector as proof-of-concept.

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 33.76623% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.87%. Comparing base (34d3fd6) to head (582af6a).

Files with missing lines Patch % Lines
...ache/pinot/calcite/sql/fun/PinotOperatorTable.java 8.69% 20 Missing and 1 partial ⚠️
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 47.05% 7 Missing and 11 partials ⚠️
...ava/org/apache/pinot/core/plan/FilterPlanNode.java 0.00% 16 Missing and 1 partial ⚠️
...or/filter/custom/CustomFilterOperatorRegistry.java 30.43% 15 Missing and 1 partial ⚠️
...e/pinot/common/filter/FilterPredicateRegistry.java 37.50% 14 Missing and 1 partial ⚠️
.../pinot/server/starter/helix/BaseServerStarter.java 0.00% 4 Missing ⚠️
...ot/common/request/context/RequestContextUtils.java 25.00% 2 Missing and 1 partial ⚠️
.../parsers/rewriter/PredicateComparisonRewriter.java 25.00% 2 Missing and 1 partial ⚠️
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 0.00% 2 Missing ⚠️
...che/pinot/common/filter/FilterPredicatePlugin.java 77.77% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18129      +/-   ##
============================================
+ Coverage     63.13%   63.87%   +0.73%     
+ Complexity     1610      789     -821     
============================================
  Files          3213     3218       +5     
  Lines        195730   195853     +123     
  Branches      30240    30252      +12     
============================================
+ Hits         123583   125094    +1511     
+ Misses        62281    60799    -1482     
- Partials       9866     9960      +94     
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (+100.00%) ⬆️
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (ø)
java-11 63.85% <33.76%> (+0.74%) ⬆️
java-21 63.83% <33.76%> (+0.73%) ⬆️
temurin 63.87% <33.76%> (+0.73%) ⬆️
unittests 63.86% <33.76%> (+0.72%) ⬆️
unittests1 56.35% <18.49%> (+0.97%) ⬆️
unittests2 34.78% <24.67%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added query Related to query processing extension-point Adds or modifies an extension/SPI point plugins Related to the plugin system feature New functionality labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a ServiceLoader-backed plugin SPI to register custom filter predicates and their execution operators, enabling new query semantics without core modifications.

Changes:

  • Introduces registries for custom filter predicate plugins and custom filter operator factories.
  • Wires registry checks into parsing/rewriting/predicate construction and adds Predicate.Type.CUSTOM.
  • Initializes registries in broker/controller/server starters and registers custom predicates into Calcite’s operator table.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java Initializes predicate and operator registries on server startup
pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java Registers custom predicates as Calcite SQL operators
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java Executes CUSTOM predicates via custom operator factories
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorRegistry.java New ServiceLoader/programmatic registry for operator factories
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java New SPI for creating filter operators for custom predicates
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java Initializes predicate registry on controller startup
pinot-common/src/main/java/org/apache/pinot/sql/parsers/rewriter/PredicateComparisonRewriter.java Delegates rewriting to custom predicate plugins when applicable
pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java Delegates validation to custom predicate plugins when applicable
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/Predicate.java Adds Predicate.Type.CUSTOM sentinel
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/CustomPredicate.java Introduces base class for custom predicates
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java Creates predicates via plugin when name matches registry
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java New ServiceLoader/programmatic registry for predicate plugins
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicatePlugin.java New SPI for validation/rewriting/operator metadata/predicate creation
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java Initializes predicate registry on broker startup

@@ -367,6 +369,7 @@ private PinotOperatorTable(boolean nullHandlingEnabled) {
registerAggregateFunctions(operatorMap);
registerTransformFunctions(operatorMap);
registerScalarFunctions(operatorMap);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PinotOperatorTable snapshots operators into an immutable map/list during construction. If FilterPredicateRegistry.init() hasn't been called yet in the current JVM, getAllPlugins() will be empty and custom predicates will never be registered (even if init() is called later). Consider ensuring the registry is initialized before iterating (e.g., call FilterPredicateRegistry.init() at the start of registerCustomFilterPredicates, or make the registry lazily initialize on first access) so operator registration is deterministic.

Suggested change
registerScalarFunctions(operatorMap);
registerScalarFunctions(operatorMap);
FilterPredicateRegistry.init();

Copilot uses AI. Check for mistakes.
}
}

private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PinotOperatorTable snapshots operators into an immutable map/list during construction. If FilterPredicateRegistry.init() hasn't been called yet in the current JVM, getAllPlugins() will be empty and custom predicates will never be registered (even if init() is called later). Consider ensuring the registry is initialized before iterating (e.g., call FilterPredicateRegistry.init() at the start of registerCustomFilterPredicates, or make the registry lazily initialize on first access) so operator registration is deterministic.

Suggested change
private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) {
private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) {
FilterPredicateRegistry.init();

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 force-pushed the pluggable-filter-registry branch 2 times, most recently from a83a1c0 to 45d13af Compare April 9, 2026 06:52
Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few high-signal issues; see inline comments.

if (!INITIALIZED.compareAndSet(false, true)) {
return;
}
ServiceLoader<FilterPredicatePlugin> loader = ServiceLoader.load(FilterPredicatePlugin.class);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ServiceLoader.load(...) only scans the thread-context/system classloader. Pinot plugins loaded from plugins.dir go through PluginManager and live in isolated plugin realms/classloaders; on Java 11+ the system loader is not a URLClassLoader, so those jars are not visible here. In practice that means a standard Pinot plugin jar will never be discovered on broker/controller, and custom predicates won't register unless the implementation is placed on the main application classpath. Can we integrate this with PluginManager (or otherwise enumerate plugin classloaders) instead of using the default ServiceLoader lookup?

if (!INITIALIZED.compareAndSet(false, true)) {
return;
}
ServiceLoader<CustomFilterOperatorFactory> loader = ServiceLoader.load(CustomFilterOperatorFactory.class);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same classloading problem as the predicate registry: this default ServiceLoader lookup will not see factories packaged as normal Pinot plugins in plugins.dir, because those jars are loaded into plugin-specific classloaders by PluginManager, not the thread-context/system loader. That leaves the server unable to find CustomFilterOperatorFactory implementations even when the broker successfully parses the predicate, so queries will fail at runtime with No CustomFilterOperatorFactory registered... in the standard deployment model.

String functionOperator = filterFunction.getFunctionName().toUpperCase();

// Check if this is a custom filter predicate registered via plugin
FilterPredicatePlugin customPlugin = FilterPredicateRegistry.get(functionOperator);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking up the custom registry before both the built-in FilterKind handling and the boolean-function fallback means a plugin named TEXT_MATCH, IN, or even startsWith will silently hijack existing query semantics in WHERE. That's a backward-compat break triggered just by dropping a jar on the classpath. The registry needs to reject collisions with built-in predicates/functions, or this lookup needs to happen only after built-ins and normal boolean functions have been ruled out.

xiangfu0 and others added 5 commits April 10, 2026 23:01
Introduce a plugin framework that allows registering custom filter predicates
(e.g., SEMANTIC_MATCH) without modifying Pinot core code. This is a one-time
surgery to add extension points at each layer of the query pipeline:

- FilterPredicatePlugin SPI: validation, rewriting, Calcite operator metadata,
  and predicate creation
- CustomFilterOperatorFactory SPI: filter operator construction at execution time
- ServiceLoader-based discovery with programmatic registration fallback
- CUSTOM sentinel in Predicate.Type for plugin-defined predicates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Example custom filter predicate plugin showing how to use the registry:
- LikeAnyPredicate: matches rows where column matches ANY LIKE pattern
- LikeAnyPlugin: implements FilterPredicatePlugin (parsing/validation)
- LikeAnyFilterOperatorFactory: implements CustomFilterOperatorFactory
- Dictionary and raw-value evaluators using combined regex
- Unit tests covering all plugin interfaces

Usage: SELECT * FROM t WHERE LIKE_ANY(col, 'foo%', '%bar', 'A_e')

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The META-INF/services files for FilterPredicatePlugin and
CustomFilterOperatorFactory cannot have license headers (ServiceLoader
format). Add them to both apache-rat-plugin and license-maven-plugin
exclusion lists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…single get()

- Use Locale.ROOT for toUpperCase() in both registries to avoid
  locale-sensitive behavior (e.g., Turkish locale)
- Add AtomicBoolean guard to init() in both registries to make
  repeated calls a cheap no-op
- Use single get() instead of containsKey()+get() in CalciteSqlParser
  to avoid potential race condition
- Add instanceof CustomPredicate check before cast in FilterPlanNode
  (both function-LHS and column-LHS paths) for clear error messages
- Add hasMetadataFilter overload to CustomFilterOperatorFactory with
  default delegation to the basic method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix LikeAnyDictionaryEvaluator.applySV: O(1) boolean[] lookup instead
  of O(k) linear scan through matching dict IDs
- Fix fully qualified java.util.ArrayList in PinotOperatorTable
- Add @nullable annotation to dataSource param in CustomFilterOperatorFactory
  with javadoc explaining when it is null (function-based LHS predicates)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the pluggable-filter-registry branch from 32ebd2f to a848bdc Compare April 11, 2026 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension-point Adds or modifies an extension/SPI point feature New functionality plugins Related to the plugin system query Related to query processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants