Add pluggable filter predicate registry for custom query semantics#18129
Add pluggable filter predicate registry for custom query semantics#18129xiangfu0 wants to merge 6 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18129 +/- ##
============================================
+ Coverage 63.13% 63.87% +0.73%
+ Complexity 1610 789 -821
============================================
Files 3213 3218 +5
Lines 195730 195853 +123
Branches 30240 30252 +12
============================================
+ Hits 123583 125094 +1511
+ Misses 62281 60799 -1482
- Partials 9866 9960 +94
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a ServiceLoader-backed plugin SPI to register custom filter predicates and their execution operators, enabling new query semantics without core modifications.
Changes:
- Introduces registries for custom filter predicate plugins and custom filter operator factories.
- Wires registry checks into parsing/rewriting/predicate construction and adds
Predicate.Type.CUSTOM. - Initializes registries in broker/controller/server starters and registers custom predicates into Calcite’s operator table.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java | Initializes predicate and operator registries on server startup |
| pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java | Registers custom predicates as Calcite SQL operators |
| pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java | Executes CUSTOM predicates via custom operator factories |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorRegistry.java | New ServiceLoader/programmatic registry for operator factories |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java | New SPI for creating filter operators for custom predicates |
| pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java | Initializes predicate registry on controller startup |
| pinot-common/src/main/java/org/apache/pinot/sql/parsers/rewriter/PredicateComparisonRewriter.java | Delegates rewriting to custom predicate plugins when applicable |
| pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java | Delegates validation to custom predicate plugins when applicable |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/Predicate.java | Adds Predicate.Type.CUSTOM sentinel |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/CustomPredicate.java | Introduces base class for custom predicates |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java | Creates predicates via plugin when name matches registry |
| pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java | New ServiceLoader/programmatic registry for predicate plugins |
| pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicatePlugin.java | New SPI for validation/rewriting/operator metadata/predicate creation |
| pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java | Initializes predicate registry on broker startup |
| @@ -367,6 +369,7 @@ private PinotOperatorTable(boolean nullHandlingEnabled) { | |||
| registerAggregateFunctions(operatorMap); | |||
| registerTransformFunctions(operatorMap); | |||
| registerScalarFunctions(operatorMap); | |||
There was a problem hiding this comment.
PinotOperatorTable snapshots operators into an immutable map/list during construction. If FilterPredicateRegistry.init() hasn't been called yet in the current JVM, getAllPlugins() will be empty and custom predicates will never be registered (even if init() is called later). Consider ensuring the registry is initialized before iterating (e.g., call FilterPredicateRegistry.init() at the start of registerCustomFilterPredicates, or make the registry lazily initialize on first access) so operator registration is deterministic.
| registerScalarFunctions(operatorMap); | |
| registerScalarFunctions(operatorMap); | |
| FilterPredicateRegistry.init(); |
| } | ||
| } | ||
|
|
||
| private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) { |
There was a problem hiding this comment.
PinotOperatorTable snapshots operators into an immutable map/list during construction. If FilterPredicateRegistry.init() hasn't been called yet in the current JVM, getAllPlugins() will be empty and custom predicates will never be registered (even if init() is called later). Consider ensuring the registry is initialized before iterating (e.g., call FilterPredicateRegistry.init() at the start of registerCustomFilterPredicates, or make the registry lazily initialize on first access) so operator registration is deterministic.
| private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) { | |
| private void registerCustomFilterPredicates(Map<String, SqlOperator> operatorMap) { | |
| FilterPredicateRegistry.init(); |
pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorRegistry.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java
Outdated
Show resolved
Hide resolved
a83a1c0 to
45d13af
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Found a few high-signal issues; see inline comments.
| if (!INITIALIZED.compareAndSet(false, true)) { | ||
| return; | ||
| } | ||
| ServiceLoader<FilterPredicatePlugin> loader = ServiceLoader.load(FilterPredicatePlugin.class); |
There was a problem hiding this comment.
This ServiceLoader.load(...) only scans the thread-context/system classloader. Pinot plugins loaded from plugins.dir go through PluginManager and live in isolated plugin realms/classloaders; on Java 11+ the system loader is not a URLClassLoader, so those jars are not visible here. In practice that means a standard Pinot plugin jar will never be discovered on broker/controller, and custom predicates won't register unless the implementation is placed on the main application classpath. Can we integrate this with PluginManager (or otherwise enumerate plugin classloaders) instead of using the default ServiceLoader lookup?
| if (!INITIALIZED.compareAndSet(false, true)) { | ||
| return; | ||
| } | ||
| ServiceLoader<CustomFilterOperatorFactory> loader = ServiceLoader.load(CustomFilterOperatorFactory.class); |
There was a problem hiding this comment.
Same classloading problem as the predicate registry: this default ServiceLoader lookup will not see factories packaged as normal Pinot plugins in plugins.dir, because those jars are loaded into plugin-specific classloaders by PluginManager, not the thread-context/system loader. That leaves the server unable to find CustomFilterOperatorFactory implementations even when the broker successfully parses the predicate, so queries will fail at runtime with No CustomFilterOperatorFactory registered... in the standard deployment model.
| String functionOperator = filterFunction.getFunctionName().toUpperCase(); | ||
|
|
||
| // Check if this is a custom filter predicate registered via plugin | ||
| FilterPredicatePlugin customPlugin = FilterPredicateRegistry.get(functionOperator); |
There was a problem hiding this comment.
Looking up the custom registry before both the built-in FilterKind handling and the boolean-function fallback means a plugin named TEXT_MATCH, IN, or even startsWith will silently hijack existing query semantics in WHERE. That's a backward-compat break triggered just by dropping a jar on the classpath. The registry needs to reject collisions with built-in predicates/functions, or this lookup needs to happen only after built-ins and normal boolean functions have been ruled out.
Introduce a plugin framework that allows registering custom filter predicates (e.g., SEMANTIC_MATCH) without modifying Pinot core code. This is a one-time surgery to add extension points at each layer of the query pipeline: - FilterPredicatePlugin SPI: validation, rewriting, Calcite operator metadata, and predicate creation - CustomFilterOperatorFactory SPI: filter operator construction at execution time - ServiceLoader-based discovery with programmatic registration fallback - CUSTOM sentinel in Predicate.Type for plugin-defined predicates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Example custom filter predicate plugin showing how to use the registry: - LikeAnyPredicate: matches rows where column matches ANY LIKE pattern - LikeAnyPlugin: implements FilterPredicatePlugin (parsing/validation) - LikeAnyFilterOperatorFactory: implements CustomFilterOperatorFactory - Dictionary and raw-value evaluators using combined regex - Unit tests covering all plugin interfaces Usage: SELECT * FROM t WHERE LIKE_ANY(col, 'foo%', '%bar', 'A_e') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The META-INF/services files for FilterPredicatePlugin and CustomFilterOperatorFactory cannot have license headers (ServiceLoader format). Add them to both apache-rat-plugin and license-maven-plugin exclusion lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…single get() - Use Locale.ROOT for toUpperCase() in both registries to avoid locale-sensitive behavior (e.g., Turkish locale) - Add AtomicBoolean guard to init() in both registries to make repeated calls a cheap no-op - Use single get() instead of containsKey()+get() in CalciteSqlParser to avoid potential race condition - Add instanceof CustomPredicate check before cast in FilterPlanNode (both function-LHS and column-LHS paths) for clear error messages - Add hasMetadataFilter overload to CustomFilterOperatorFactory with default delegation to the basic method Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix LikeAnyDictionaryEvaluator.applySV: O(1) boolean[] lookup instead of O(k) linear scan through matching dict IDs - Fix fully qualified java.util.ArrayList in PinotOperatorTable - Add @nullable annotation to dataSource param in CustomFilterOperatorFactory with javadoc explaining when it is null (function-based LHS predicates) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
32ebd2f to
a848bdc
Compare
Summary
Introduces a plugin framework (
FilterPredicatePlugin+CustomFilterOperatorFactory) that allows registering custom filter predicates (e.g.,SEMANTIC_MATCH) via ServiceLoader without modifying Pinot core code.Motivation
Adding a new filter predicate today requires touching 8 hardcoded points across 4 modules (enums, switch statements, if-else chains). This one-time change adds registry-based extension points so future predicates can be added purely via plugin JARs on the classpath.
New SPI Interfaces
FilterPredicatePlugin (pinot-common)
Handles the query parsing pipeline:
name()— predicate name (e.g., "SEMANTIC_MATCH")validateFilterExpression()— SQL validation (CalciteSqlParser)rewriteExpression()— predicate rewriting (PredicateComparisonRewriter)getOperandTypes()/getOptionalOperandIndices()— Calcite operator registrationcreatePredicate()— convert parsed operands to a Predicate objectCustomFilterOperatorFactory (pinot-core)
Handles query execution:
predicateName()— matches the predicate namecreateFilterOperator()— creates the filter operator for a segmentcreateFilterOperator(..., hasMetadataFilter)— metadata-filter-aware variant for AND queriesCustomPredicate (pinot-common)
Base class for plugin predicates. Returns
Predicate.Type.CUSTOMand carries acustomTypeNamefor factory lookup at execution time.New Files
FilterPredicatePlugin.javaFilterPredicateRegistry.javaCustomPredicate.javaCustomFilterOperatorFactory.javaCustomFilterOperatorRegistry.javaExtension Points Modified
Predicate.TypeCUSTOMsentinelCalciteSqlParser.validateFilter()PredicateComparisonRewriterRequestContextUtils.getFilterInner()FilterPlanNodeCUSTOMcase + function-LHS handlingPinotOperatorTableregisterCustomFilterPredicates()from registryBaseBrokerStarterFilterPredicateRegistry.init()at startupBaseServerStarterBaseControllerStarterFilterPredicateRegistry.init()at startupWhat a Plugin Looks Like
Performance Impact
Zero for existing built-in predicates. Custom predicates add one
ConcurrentHashMap.get()per query (planning) and per segment (execution). Per-row filter execution has zero overhead.Test plan
Follow-up PR will extract
VECTOR_SIMILARITYintopinot-plugins/pinot-vectoras proof-of-concept.🤖 Generated with Claude Code