Land analytics-engine PPL integration into main#5431
Closed
ahkcs wants to merge 22 commits into
Closed
Conversation
…5267) * [Mustang] Add query routing and execution handoff for Parquet-backed indices (#5247) Implements the query routing and AnalyticsExecutionEngine for Project Mustang's unified query pipeline. PPL queries targeting parquet_ prefixed indices are routed through UnifiedQueryPlanner and executed via a stub QueryPlanExecutor, with results formatted through the existing JDBC response pipeline. New files: - QueryPlanExecutor: @FunctionalInterface contract for analytics engine - AnalyticsExecutionEngine: converts Iterable<Object[]> to QueryResponse with type mapping and query size limit enforcement - RestUnifiedQueryAction: orchestrates schema building, planning, execution on sql-worker thread pool, with client/server error classification and metrics - StubQueryPlanExecutor: canned data for parquet_logs and parquet_metrics tables for development and testing Modified files: - RestPPLQueryAction: routing branch for parquet_ indices - SQLPlugin: passes ClusterService and NodeClient to RestPPLQueryAction - plugin/build.gradle: adds :api dependency for UnifiedQueryPlanner Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Align isClientError with RestPPLQueryAction classification Add missing exception types to isClientError(): IndexNotFoundException, ExpressionEvaluationException, QueryEngineException, DataSourceClientException, IllegalAccessException. Matches the full list in RestPPLQueryAction.isClientError(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move stub code into analytics/stub sub-package Extract StubSchemaProvider, StubQueryPlanExecutor, and StubIndexDetector into plugin/.../rest/analytics/stub/ package to clearly separate temporary stub code from production code. RestUnifiedQueryAction now delegates to these stub classes. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use ErrorMessageFactory for error responses Replace hand-crafted JSON error response with ErrorMessageFactory.createErrorMessage(), matching the standard error format used in RestPPLQueryAction.reportError(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Make metrics and response formatting query-type-aware Use QueryType to select the correct metrics (PPL_REQ_TOTAL vs REQ_TOTAL, PPL_FAILED_REQ_COUNT_* vs FAILED_REQ_COUNT_*) and LangSpec (PPL_SPEC vs SQL_SPEC) so this class can serve both PPL and SQL queries when unified SQL support is added. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move analytics routing from REST layer to transport layer Move the analytics index routing check from RestPPLQueryAction into TransportPPLQueryAction.doExecute(). This ensures the analytics path gets the same PPL enabled check, metrics, request ID, and inter-plugin transport support as the existing Lucene path. RestPPLQueryAction and SQLPlugin are reverted to their original state. Added executeViaTransport() to RestUnifiedQueryAction which returns results via ActionListener<TransportPPLQueryResponse> instead of RestChannel, integrating properly with the transport action pattern. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add query size limit to RelNode plan instead of post-processing Add LogicalSystemLimit to the RelNode plan before passing it to the analytics engine, consistent with PPL V3 (QueryService.convertToCalcitePlan). This ensures the analytics engine enforces the limit during execution rather than returning all rows for post-processing truncation. Remove post-processing querySizeLimit truncation from AnalyticsExecutionEngine -- the limit is now part of the plan the executor receives. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Propagate security context to sql-worker thread pool Wrap scheduled lambdas in both execute() and executeViaTransport() with withCurrentContext() to capture and restore ThreadContext (user identity, permissions, audit trail) on the worker thread. Follows the same pattern as OpenSearchQueryManager.withCurrentContext(). Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move analytics routing after profiling setup Move the analytics index routing check after QueryContext.setProfile() and wrapWithProfilingClear(listener). Use clearingListener instead of raw listener so profiling thread-local state is properly cleaned up after analytics path execution. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove NPE from isClientError classification NPE in the analytics path is a server bug (null schema field, missing row), not bad user input. Removed from client error list. Will sync this classification with RestPPLQueryAction updates in #5266. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove duplicate REST execution path from RestUnifiedQueryAction Remove execute(query, queryType, channel), doExecute(), createQueryListener(channel), recordSuccessMetric(), recordFailureMetric(), reportError(), and related REST imports. Since routing now goes through TransportPPLQueryAction, the REST-specific path was unused. Renamed executeViaTransport() to execute() as the sole entry point. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Mustang] Add explain support and integration tests for analytics path (#5247) Add explain support for the analytics engine path: - AnalyticsExecutionEngine.explain(RelNode, ExplainMode, ...): returns logical plan via RelOptUtil.toString() with mode-aware SqlExplainLevel (SIMPLE/STANDARD/COST). Physical and extended plans are null since the analytics engine is not yet available. - RestUnifiedQueryAction.explain(): new entry point for explain requests, delegates to AnalyticsExecutionEngine.explain() with ExplainMode.STANDARD. Response formatted via JsonResponseFormatter with normalizeLf(). - TransportPPLQueryAction: routes explain requests for analytics indices to unifiedQueryHandler.explain() instead of unifiedQueryHandler.execute(). Integration tests (AnalyticsPPLIT): - testExplainResponseStructure: verifies JSON shape (calcite.logical, calcite.physical=null, calcite.extended=null) - testExplainProjectAndScan: LogicalProject + LogicalTableScan + table name - testExplainFilterPlan: LogicalFilter with condition value - testExplainAggregationPlan: LogicalAggregate with COUNT() - testExplainSortPlan: LogicalSort Unit tests (AnalyticsExecutionEngineTest): - explainRelNode_returnsLogicalPlan - explainRelNode_errorPropagation Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add AnalyticsExplainIT with expected output files Add dedicated explain integration test class following the CalciteExplainIT pattern: each test compares actual explain JSON against expected output files in resources/expectedOutput/analytics/. Tests cover simple scan, project, filter+project, aggregation, sort (with collation propagation to LogicalSystemLimit), and eval. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix explain unit test to verify actual logical plan content Use a real Calcite LogicalValues node instead of a mock so RelOptUtil.toString() produces a deterministic plan. Assert the exact expected logical plan string instead of just checking non-null. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove explain unit tests in favor of AnalyticsExplainIT Explain is thoroughly covered by AnalyticsExplainIT with expected output file comparison (6 tests). Remove redundant explain unit tests and unused captureExplainListener helper. Execute unit tests unchanged. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove explain tests from AnalyticsPPLIT in favor of AnalyticsExplainIT Explain is covered by AnalyticsExplainIT with expected output file comparison (6 tests). Remove redundant explain tests and extractLogicalPlan helper from AnalyticsPPLIT. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Switch explain tests to YAML format with format-aware listener - Add YAML format support to createExplainListener() by checking pplRequest.getFormat(), matching TransportPPLQueryAction pattern - Switch AnalyticsExplainIT from explainQueryToString (JSON) to explainQueryYaml (YAML) with assertYamlEqualsIgnoreId - Replace JSON expected files with YAML expected files - YAML serializer omits null fields (physical/extended), so expected files only contain the logical plan Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use request explain mode instead of hardcoded STANDARD Pass pplRequest.mode() to analyticsEngine.explain() so the user's ?mode= parameter (simple, standard, cost, extended) is respected. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix normalizeLf inconsistency with existing PPL explain path Only apply normalizeLf() for YAML format, return response directly for JSON format. Matches TransportPPLQueryAction.createExplainResponseListener() which uses normalizeLf for YAML but returns response as-is for JSON. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add toExplainLevel() to ExplainMode enum Encapsulate the ExplainMode to SqlExplainLevel mapping in the enum itself instead of repeating ternary logic in each execution engine. AnalyticsExecutionEngine now uses mode.toExplainLevel() directly. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove logging from AnalyticsExplainIT Remove LOG.info calls and Logger field. The YAML comparison assertion already provides clear output on failure. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove explain coverage comments Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Reuse TransportPPLQueryAction.createExplainResponseListener for explain formatting Remove duplicate createExplainListener from RestUnifiedQueryAction. explain() now returns ExplainResponse via ResponseListener, and TransportPPLQueryAction wraps it with its existing createExplainResponseListener for format-aware (JSON/YAML) formatting. This avoids duplicating the format selection logic. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Init CLAUDE.md (#5259) Signed-off-by: Heng Qian <qianheng@amazon.com> * Add label to exempt specific PRs from stalled labeling (#5263) * Implement `reverse` performance optimization (#4775) Co-authored-by: Jialiang Liang <jiallian@amazon.com> * Add songkant-aws as maintainer (#5244) * Move some maintainers from active to Emeritus (#5260) * Move inactive current maintainers to Emeritus Signed-off-by: Lantao Jin <ltjin@amazon.com> * Remove affiliation column for emeritus maintainers Signed-off-by: Lantao Jin <ltjin@amazon.com> * formatted Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix formatting in MAINTAINERS.md Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com> * Add query cancellation support via _tasks/_cancel API for PPL queries (#5254) * Add query cancellation support via _tasks/_cancel API for PPL queries Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> * Refactor PPL query cancellation to cooperative model and other PR suggestions. Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> --------- Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> * Add Calcite native SQL planning in UnifiedQueryPlanner (#5257) * feat(api): Add Calcite native SQL planning path in UnifiedQueryPlanner Add SQL support to the unified query API using Calcite's native parser pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode), bypassing the ANTLR parser used by PPL. Changes: - UnifiedQueryPlanner: use PlanningStrategy to dispatch CalciteNativeStrategy vs CustomVisitorStrategy - CalciteNativeStrategy: Calcite Planner with try-with-resources for ANSI SQL - CustomVisitorStrategy: ANTLR-based path for PPL (and future SQL V2) - UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to preserve lowercase OpenSearch index names Signed-off-by: Chen Dai <daichen@amazon.com> * test(api): Add SQL planner tests and refactor test base for multi-language support - Refactor UnifiedQueryTestBase with queryType() hook for subclass override - Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN, ORDER BY, subquery, case sensitivity, namespaces, and error handling - Update UnifiedQueryContextTest to verify SQL context creation Signed-off-by: Chen Dai <daichen@amazon.com> * perf(benchmarks): Add SQL queries to UnifiedQueryBenchmark Add language (PPL/SQL) and queryPattern param dimensions for side-by-side comparison of equivalent queries across both languages. Remove separate UnifiedSqlQueryBenchmark in favor of unified class. Signed-off-by: Chen Dai <daichen@amazon.com> * docs(api): Update README to reflect SQL support in UnifiedQueryPlanner Signed-off-by: Chen Dai <daichen@amazon.com> * fix(api): Normalize trailing whitespace in assertPlan comparison RelOptUtil.toString() appends a trailing newline after the last plan node, which doesn't match Java text block expectations. Also add \r\n normalization for Windows CI compatibility, consistent with the existing pattern in core module tests. Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com> * [Feature] Support graphLookup with literal value as its start (#5253) * [Feature] Support graphLookup as top-level PPL command (#5243) Add support for graphLookup as the first command in a PPL query with literal start values, instead of requiring piped input from source=. Syntax: graphLookup table start="value" edge=from-->to as output graphLookup table start=("v1", "v2") edge=from-->to as output Signed-off-by: Heng Qian <qianheng@amazon.com> * Spotless check Signed-off-by: Heng Qian <qianheng@amazon.com> * Ignore child pipe if using start value Signed-off-by: Heng Qian <qianheng@amazon.com> * Add graphLookup integration tests per PPL command checklist - Add explain plan tests in CalciteExplainIT with YAML assertions - Add v2-unsupported tests in NewAddedCommandsIT - Add CalcitePPLGraphLookupIT to CalciteNoPushdownIT suite - Skip graphLookup tests when pushdown is disabled (required by impl) - Add expected plan YAML files for piped and top-level graphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Remove brace of start value list Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> * Apply docs website feedback to ppl functions (#5207) * apply doc website feedback to ppl functions Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * take out comments Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix json_append example Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix json_append example Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix links Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> --------- Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com> * feat(api): Add profiling support to unified query API (#5268) Add query profiling infrastructure that measures time spent in each query phase (analyze, optimize, execute, format). Profiling is opt-in via UnifiedQueryContext.builder().profiling(true) and uses thread-local context to avoid passing profiling state through every method. Key changes: - QueryProfiling/ProfileContext for thread-local profiling lifecycle - UnifiedQueryContext.measure() API for timing arbitrary phases - Auto-profiling in UnifiedQueryPlanner (analyze) and compiler (optimize) - UnifiedQueryTestBase shared test fixture for unified query tests - Comprehensive profiling tests with non-flaky >= 0 timing assertions Signed-off-by: Chen Dai <daichen@amazon.com> * Add UnifiedQueryParser with language-specific implementations (#5274) Extract parsing logic from UnifiedQueryPlanner into a UnifiedQueryParser interface with language-specific implementations: PPLQueryParser (returns UnresolvedPlan) and CalciteSqlQueryParser (returns SqlNode). UnifiedQueryContext owns the parser instance, created eagerly by the builder which has direct access to query type and future SQL config. Each implementation receives only its required dependencies: PPLQueryParser takes Settings, CalciteSqlQueryParser takes CalcitePlanContext. UnifiedQueryPlanner.CustomVisitorStrategy now obtains the parser from the context via the interface type. Signed-off-by: Chen Dai <daichen@amazon.com> * Fix flaky TPC-H Q1 test due to bugs in `MatcherUtils.closeTo()` (#5283) * Fix the flaky tpch Q1 Signed-off-by: Lantao Jin <ltjin@amazon.com> * Change to ULP-aware to handle floating-point precision differences Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Simeon Widdis <sawiddis@gmail.com> Signed-off-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> Signed-off-by: Chen Dai <daichen@amazon.com> Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: qianheng <qianheng@amazon.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com> Co-authored-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com> Co-authored-by: Sunil Ramchandra Pawar <pawar_sr@apple.com> Co-authored-by: Chen Dai <daichen@amazon.com> Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
* [Mustang] Enable profiling and migrate to UnifiedQueryParser (#5247) Task 1: Enable profiling (#5268) - Add .profiling(pplRequest.profile()) to UnifiedQueryContext.builder() in both doExecute and doExplain Task 2: Migrate to UnifiedQueryParser for index extraction (#5274) - Replace StubIndexDetector regex with PPLQueryParser AST-based extraction: parse query, walk AST to find Relation node, extract table name via getTableQualifiedName() - Delete StubIndexDetector - isAnalyticsIndex() is now an instance method (needs PPLQueryParser) - Constructor takes Settings for PPLQueryParser Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Switch to SimpleJsonResponseFormatter for profiling support Switch from JdbcResponseFormatter to SimpleJsonResponseFormatter so profiling data is included in the response when profile=true. The SimpleJsonResponseFormatter calls QueryProfiling.current().finish() to populate the profile field. Update test assertions to match SimpleJsonResponseFormatter type names (PPL_SPEC: INTEGER -> "int", STRING -> "string") and remove status field check (not included by SimpleJsonResponseFormatter). Add integration test verifying profile field appears in response. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use context parser for index extraction instead of standalone PPLQueryParser Create UnifiedQueryContext upfront in isAnalyticsIndex() and use context.getParser() for index name extraction. This reuses the context-owned parser which supports both PPL and SQL, making it ready for unified SQL support without code changes. Remove standalone PPLQueryParser field and Settings constructor param. isAnalyticsIndex() now takes QueryType to create the right context. extractIndexName() handles UnresolvedPlan (PPL) with a TODO for SqlNode (SQL) when unified SQL is enabled. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use AST visitor for index name extraction Replace manual tree walking (findRelation) with IndexNameExtractor visitor extending AbstractNodeVisitor. The visitor's visitRelation() is called automatically by the AST accept/visitChildren pattern, which handles tree traversal. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Wrap execute and explain with context.measure() for profiling Wrap analyticsEngine.execute() and analyticsEngine.explain() calls with context.measure(MetricName.EXECUTE, ...) so execution time is captured in the profiling metrics. Planning is auto-profiled by UnifiedQueryPlanner. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix EXECUTE profiling metric by recording inside AnalyticsExecutionEngine Move EXECUTE metric recording into AnalyticsExecutionEngine.execute(), between the actual execution (planExecutor + row conversion) and the listener.onResponse() call. This ensures the metric is written before SimpleJsonResponseFormatter calls QueryProfiling.finish() to snapshot. Previously context.measure() was used in RestUnifiedQueryAction, but finish() was called inside the listener callback (synchronously) before measure()'s finally block could write the metric, resulting in 0ms. Add IT assertion that execute phase time_ms > 0 to catch this bug. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Mustang] Wire analytics-engine as extendedPlugins dependency (#5247) Step 1: Plugin wiring and dependency integration. - Add analytics-engine as extendedPlugins in plugin/build.gradle - Vendor analytics-framework JAR (interfaces) and analytics-engine ZIP (plugin) built from OpenSearch sandbox/3.6 branch - Delete local QueryPlanExecutor interface, use upstream org.opensearch.analytics.exec.QueryPlanExecutor from JAR - Replace StubSchemaProvider with OpenSearchSchemaBuilder which reads real index mappings from ClusterState - Delete StubSchemaProvider (no longer needed) - Exclude shared JARs (Calcite, Guava, commons-*, etc.) from SQL plugin bundle to avoid jar hell with analytics-engine classloader - Load analytics-engine plugin in integTest and remoteCluster test clusters before opensearch-sql-plugin - Create parquet_logs and parquet_metrics indices in ITs so OpenSearchSchemaBuilder can resolve the schema - Update explain expected files for alphabetical field ordering Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add analytics-engine plugin to all test clusters Every test cluster that loads opensearch-sql-plugin needs the analytics-engine plugin because SQL declares it as extendedPlugins. Added to yamlRestTest, integTestWithSecurity, remoteIntegTestWithSecurity, and integJdbcTest clusters. Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add analytics-engine plugin to doctest test cluster Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add commons-text to analytics-engine ZIP and fix isAnalyticsIndex NPE commons-text is needed by Calcite (parent classloader) for fuzzy matching but was only in the SQL plugin (child classloader). Also use lightweight parsing context in isAnalyticsIndex to avoid requiring cluster state. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix Janino classloader issue when analytics-engine is parent plugin Calcite's EnumerableInterpretable.getBindable() hardcodes EnumerableInterpretable.class.getClassLoader() for Janino compilation. When analytics-engine is the parent classloader via extendedPlugins, this returns the parent classloader which cannot see SQL plugin classes, causing CompileException for any Enumerable code generation. Override implement() in OpenSearchCalcitePreparingStmt to use our own compileWithPluginClassLoader() which does the same code generation but uses CalciteToolsHelper.class.getClassLoader() (SQL plugin's child classloader) so Janino can resolve both parent and child classes. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Reverse classloader: make SQL plugin parent of analytics-engine The previous approach (analytics-engine as parent) caused Janino classloader issues in 4 Calcite code paths. Reversing the relationship makes SQL plugin the parent, so Janino uses the SQL plugin classloader which can see both Calcite and SQL plugin classes. Changes: - Remove extendedPlugins=['analytics-engine'] from SQL plugin - Add ExtensiblePlugin interface to SQLPlugin - Rebuild analytics-engine ZIP with extended.plugins=opensearch-sql and without Calcite JARs (inherited from parent) - Move OpenSearchSchemaBuilder to analytics-framework JAR - Change analytics-framework from compileOnly to api in core - Reverse test cluster plugin load order (SQL first) - Revert all Janino classloader fixes (no longer needed) - Add classloader architecture options doc Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove classloader options doc from PR Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove unnecessary Janino classloader workarounds With SQL plugin as parent (Option C), EnumerableInterpretable and RexExecutable use the SQL plugin classloader which sees all classes. The implementEnumerable(), compileWithPluginClassLoader(), and PluginClassLoaderRexExecutor workarounds are no longer needed. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert "Remove unnecessary Janino classloader workarounds" This reverts commit 23e0d13. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert "Remove classloader options doc from PR" This reverts commit 0ad18a8. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert "Reverse classloader: make SQL plugin parent of analytics-engine" This reverts commit be27381. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert "Fix Janino classloader issue when analytics-engine is parent plugin" This reverts commit 2c4c401. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Patch Calcite CALCITE-3745: use thread context classloader for Janino Calcite hardcodes EnumerableInterpretable.class.getClassLoader() and RexExecutable.class.getClassLoader() for Janino compilation. When analytics-engine is the parent classloader, this returns the parent which cannot see SQL plugin classes, causing CompileException. Patch calcite-core to prefer Thread.currentThread().getContextClassLoader() (CALCITE-3745), and set the context classloader to the SQL plugin's classloader before Calcite execution in two places: - CalciteToolsHelper.OpenSearchRelRunners.run() - CalciteScriptEngine.compile() This keeps analytics-engine as the natural parent while fixing all Janino classloader issues with a minimal, well-understood patch. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Fix CalciteExplainIT boolean string literal explain plan difference With the patched Calcite classloader, SAFE_CAST('TRUE') is no longer silently folded to a boolean literal in the logical plan. Split the expected YAML so boolean literal tests (male = true) and string literal tests (male = 'TRUE') have separate expected outputs. The physical plan (term query pushdown) is identical in both cases. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add setContextClassLoader wrapper to QueryService Calcite paths The analyze() and convertToCalcitePlan() steps in executeWithCalcite() and explainWithCalcite() trigger RexSimplify which uses Janino for constant folding. Without the context classloader set, UDF expressions like SAFE_CAST('TRUE') can't be folded, causing explain plan differences vs main branch. Adding the wrapper here covers the entire Calcite lifecycle for both execute and explain paths. Revert the separate boolean string literal YAML since all three boolean tests now produce the same plan (SAFE_CAST folds correctly). Signed-off-by: Kai Huang <ahkcs@amazon.com> * Refactor context classloader wrapping into CalciteClassLoaderHelper Extract the CALCITE-3745 context classloader fix into a dedicated CalciteClassLoaderHelper utility class. Remove redundant wrapper from OpenSearchRelRunners.run() (already covered by QueryService callers). Three call sites remain: - QueryService.executeWithCalcite() — covers planning + execution - QueryService.explainWithCalcite() — covers planning + explain - CalciteScriptEngine.compile() — covers pushdown scripts (shard thread) Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert CalciteToolsHelper comment to sync with main Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <kaihuang@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Integrate SQL REST endpoint with analytics engine path Add SQL query routing through the analytics engine for Parquet-backed indices. SQL queries targeting "parquet_*" indices are routed to RestUnifiedQueryAction via the unified query pipeline (Calcite SQL parser -> UnifiedQueryPlanner -> AnalyticsExecutionEngine). Changes: - Add SqlNode table extraction in RestUnifiedQueryAction.extractIndexName() to support SQL query routing (handles SqlSelect -> SqlIdentifier) - Add executeSql() and explainSql() methods in RestUnifiedQueryAction for SQL queries (parallel to existing PPL execute/explain) - Add analytics routing in RestSqlAction via optional BiFunction router that checks isAnalyticsIndex before delegating to SQLService - Wire the router in SQLPlugin.createSqlAnalyticsRouter() - Non-analytics SQL queries fall through to the existing V2 engine - Add AnalyticsSQLIT integration tests: SELECT *, column projection, explain, non-parquet fallback, syntax error handling Resolves: #5248 Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use queryType to branch index extraction instead of instanceof Signed-off-by: Kai Huang <ahkcs@amazon.com> * Use Optional for extractIndexName return type Signed-off-by: Kai Huang <ahkcs@amazon.com> * Unify execute and explain methods for both PPL and SQL paths Signed-off-by: Kai Huang <ahkcs@amazon.com> * Reuse base class explainQuery() in AnalyticsSQLExplainIT Signed-off-by: Kai Huang <ahkcs@amazon.com> * Reuse base class executeQuery() in AnalyticsSQLIT Signed-off-by: Kai Huang <ahkcs@amazon.com> * Remove no-arg constructor and null check for analyticsRouter analyticsRouter is always provided — only one caller exists (SQLPlugin.getRestHandlers). Remove the backward-compatible no-arg constructor and the null check. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Extract V2 engine delegation into delegateToV2Engine method Restore original logging (query anonymization, explain fallback) that was lost when the fallback logic was inlined into the analytics router lambda. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Refactor SQL table extraction to use SqlBasicVisitor pattern Replace instanceof checks with SqlTableNameExtractor visitor, consistent with how PPL uses AbstractNodeVisitor for index name extraction. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Move SqlBasicVisitor to import header Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Version bump to OpenSearch 3.7 (Jackson 2 → 3 parser API + _source excludes serialization) (#5361) Signed-off-by: Kai Huang <ahkcs@amazon.com> * Update vendored analytics JARs from 3.6 to 3.7 Replace libs/ JARs with 3.7.0-SNAPSHOT versions built from the OpenSearch sandbox. Update build files to reference 3.7 file names. Keep files() references instead of Maven coordinates so CI can resolve dependencies without publishToMavenLocal. Signed-off-by: Ahmed Khatib <ahkcs@amazon.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Async QueryPlanExecutor + drop stub and stub-only ITs The analytics-framework 3.7 QueryPlanExecutor interface is async (void execute(plan, ctx, ActionListener<Stream>)). Update AnalyticsExecutionEngine to dispatch via the listener instead of the old synchronous return-value form, and rework the unit test mocks likewise. Drop StubQueryPlanExecutor and its callers (TransportPPLQueryAction, SQLPlugin, RestUnifiedQueryActionTest); the SQL plugin now expects the real analytics-engine executor to be supplied via Guice cross-plugin injection (a small AnalyticsExecutorHolder bridges the binding to the SQLPlugin REST handler that runs before Node injection). Drop AnalyticsPPLIT and AnalyticsExplainIT (and their expectedOutput/analytics fixtures): these tested stub canned data which no longer exists. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Ahmed Khatib <ahkcs@amazon.com>
… to fix jar hell (#5400)
opensearch-build-tools requires Gradle 9.4.1+, which currently breaks the analytics-engine integration build that depends on this branch. Mirrors opensearch-project/geospatial#854. Signed-off-by: Kai Huang <huangkaics@gmail.com>
…5409) The arrow-flight-rpc plugin (a transitive parent of analytics-engine via extendedPlugins) bundles jsr305-3.0.2.jar, and the SQL plugin pulls in the same artifact via Calcite's transitive findbugs:jsr305 dependency. With both plugins shipping the same jar, OpenSearch's PluginsService.checkBundleJarHell fails at install time with: IllegalStateException: jar hell! class: javax.annotation.CheckForNull jar1: .../installing-.../jsr305-3.0.2.jar jar2: .../plugins/arrow-flight-rpc/jsr305-3.0.2.jar Same pattern as #5400 — exclude the jar from the SQL bundle and rely on the copy already provided by arrow-flight-rpc through the shared classloader. Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Carry CalciteEvalCommandIT through the helper-managed index path
The IT's init() previously created the in-test test_eval index via direct
`PUT /test_eval/_doc/N` requests, relying on dynamic mapping. Two
problems with that:
1. The doc PUTs auto-create the index with whatever settings the
cluster defaults to. The analytics-engine compatibility path
(force-routing on; tests.analytics.parquet_indices=true) needs
parquet-backed indices, which TestUtils.createIndexByRestClient
applies via TestUtils.makeParquetBacked when the system property
is set. Direct PUTs sidestep that helper, so test_eval lands as
Lucene-backed and the analytics planner rejects it with
"No backend can scan all requested fields on index [test_eval]".
Three of the four tests fail at execution; the fourth uses BANK
(loadIndex'd) and passes.
2. init() runs before every @test method. The doc PUTs are doc-level
idempotent, so re-running was wasteful but not failing. Once we
switch to createIndexByRestClient, the index-level PUT is no longer
idempotent and re-running throws "resource_already_exists_exception".
Both addressed in one change:
- test_eval is created via TestUtils.createIndexByRestClient with an
explicit mapping (name/title=keyword, age=long). The helper honours
tests.analytics.parquet_indices=true and produces a parquet-backed
index for the analytics-engine sweep; on the v2 path the helper
is a no-op around the index PUT, so behaviour is unchanged.
- The whole init body is guarded by TestUtils.isIndexExist — same
idempotency idiom that loadIndex uses for predefined fixtures. First
@test method provisions; subsequent methods skip.
Also pins the projection order on testEvalStringConcatenation. The
original query (`source=test_eval | eval greeting = 'Hello ' + name`)
had no `| fields` clause and relied on the implicit projection's column
order — v2 returns Lucene-source insertion order, analytics returns
parquet-storage order (alphabetical), so the assertion only matched on
v2 by coincidence. Adding `| fields name, title, age, greeting` makes
the assertion deterministic across paths; the existing expected rows
(`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this
order, so v2 behaviour is preserved.
No semantic change for the v2 path: the explicit mapping types
(keyword, long, keyword) resolve to the same PPL types ("string",
"bigint", "string") that dynamic mapping inferred, and eval reads from
_source either way. Analytics-route compatibility goes from 1/4 to 4/4
once the corresponding analytics-engine wiring lands in core.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Exclude @ignore'd test classes from integTest as Gradle 9.4.1 workaround
Gradle 9.4.1 has a ClassCastException in TestEventReporterAsListener.started
(line 58 of org/gradle/api/internal/tasks/testing/junit/result/TestEventReporterAsListener.class
inside plugins/gradle-testing-base-9.4.1.jar):
((GroupTestEventReporterInternal) reportersById.get(parent.getId()))
.reportTestDirectly(...)
The bridge stores test descriptor reporters keyed by id and casts the
parent's stored reporter to GroupTestEventReporterInternal when a child
event arrives. A class-level @ignore produces a non-composite parent
descriptor with a leaf TestEventReporter (not a group reporter), so the
cast fails. The first failure in CI was on
CalciteInformationSchemaCommandIT.testTablesFromPrometheusCatalog —
that IT is @ignore'd at the class level pointing at issue #3455.
Once any @ignore'd class is loaded by the test runner, the cast aborts
the whole integTest task with "Test process encountered an unexpected
problem", even though no actual test assertion failed and the tests
themselves would have been skipped at the JUnit layer.
The bridge is instantiated by Gradle's own AbstractTestTask (verified
via javap of `gradle-testing-base-9.4.1.jar` — `new TestEventReporterAsListener`
at AbstractTestTask:263), so we cannot prevent its registration from
user code. The only available remedy until Gradle ships a fix is to
keep @ignore'd classes off the test runner's input set entirely.
Net behaviour for CI:
- @ignore'd tests were already skipped at the JUnit layer; skipping
them at the Gradle layer instead changes "skipped" to "not run".
For metrics/dashboards that count skipped tests, this is a one-time
minor delta. For pass/fail status, identical.
- Pre-existing exclude block (`OrderIT`, `ExplainIT`, etc.) already
excludes some of the same classes; this commit covers the remaining
@ignore'd ones. OrderIT not duplicated.
- integTest task no longer aborts mid-run; per-test FAILED messages
(testLogging.events "failed") are preserved.
Remove this block once Gradle 9.4.x bug is fixed upstream. The list
mirrors `grep -rln '^@ignore' integ-test/src/test/java` minus
already-excluded paths.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
---------
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Default plugins.calcite.enabled=true on the unified query path
The unified query handler (`RestUnifiedQueryAction` → `TransportPPLQueryAction`
→ analytics-engine) builds its `UnifiedQueryContext` with an empty `Settings`
map; nothing wires the cluster setting through. PPL parsing is delegated to
the same `AstBuilder` the v2 path uses, and that builder gates `table` (and
other Calcite-only commands) on `Settings.Key.CALCITE_ENGINE_ENABLED`. With
no setting propagated, the gate sees `null`, fails the `Boolean.TRUE.equals`
check, and throws
UnsupportedOperationException: Table command is supported only when
plugins.calcite.enabled=true
even when the cluster setting is true, blocking every `table` query routed
through the analytics path under `tests.analytics.force_routing=true`.
The unified path is by definition Calcite-based — every query reaching
`UnifiedQueryContext` flows through Calcite's planner. Default
`CALCITE_ENGINE_ENABLED=true` in `buildSettings()` when the underlying map
doesn't have the key. This unblocks `table` and any other AstBuilder gate
that defends the same toggle, without changing v2 behavior (v2 constructs
`AstBuilder` with cluster `Settings`, not the unified context).
Also refresh the PPL analytics-engine routing dev doc to document the
unified-context dependency and switch the per-command verification recipe
from the bundle-branch `analyticsCompatibilityTest` task to the standard
`integTestRemote -Dtests.analytics.{force_routing,parquet_indices}=true` —
the bundle-branch task is purely for the daily coverage-report sweep.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Drop dev docs from this PR — track separately
These local routing-and-coverage notes are useful but belong in their own
review thread (or stay as untracked working notes), not in the
unified-context fix. Keeping them on disk via .gitignore-style untrack so
the PR stays focused on the single api/ change.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Address @dai-chen: add CALCITE_ENGINE_ENABLED to default settings map
@dai-chen suggested either adding the setting to the default list at
UnifiedQueryContext.java:123 or passing it via the .setting() API on the
builder, since only the unified PPL path requires it.
Going with option 1 (default-list entry) — it composes the same way every
other planning-required default does, and avoids forcing every caller (the
production REST handler and every IT) to remember a single magic .setting()
call. The IT at UnifiedQueryOpenSearchIT.java:51 was the precedent for the
.setting() approach but only one IT needs it; the production caller
(RestUnifiedQueryAction) wouldn't have a natural place to wire this without
either repeating it everywhere or routing through a helper.
Removes the conditional in buildSettings().getSettingValue() — the default
map now carries the value the same way QUERY_SIZE_LIMIT, PPL_SUBSEARCH_MAXOUT,
and PPL_JOIN_SUBSEARCH_MAXOUT do.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* CalcitePPLRenameIT: switch to column-name-aware row matcher
Previously, {@code verifyStandardDataRows} (and a handful of bespoke
{@code verifyDataRows} calls in this file) compared rows positionally —
they assumed the engine emits columns in the {@code _source} iteration
order the v2 / Lucene path produces. Under the analytics-engine route the
parquet-backed reader returns columns in storage order, so the same 4
canonical state_country rows came back as {@code [70,"USA",4,"Jake",
"California",2023]} instead of {@code [Jake,USA,California,4,2023,70]}
and 12 of 22 tests failed despite the data being identical.
Replace with a column-name-keyed expected-row map. The helper reads the
actual schema from the response, looks up each canonical value by column
name, places it at the corresponding schema position, then defers to
{@code verifyDataRows} as before. The contract is identical to the
existing {@code verifySchema} matcher — both are set-equality on column
names — so the test no longer leaks the engine's emission order into
the assertion.
Each call site passes the canonical column-name list (with rename
substitutions where applicable). Tests that don't rename age keep
calling the no-arg form. Both paths now pass:
* Analytics-engine route (`tests.analytics.force_routing=true`): 20 / 22
(the remaining 2 are `testRenameInAgg` /
`testRenameWithBackticksInAgg`, blocked on the Substrait isthmus
AVG-binding follow-up tracked in
opensearch-project/OpenSearch#21521).
* v2 / Calcite route (default routing): 20 / 22 (same two tests fail with
`NoClassDefFoundError: LevenshteinDistance` only when running against
the OS-core `:run` cluster — that bundle is missing commons-text.
CI's own integ-test cluster bundles commons-text via the SQL plugin's
classloader and isn't affected.)
Signed-off-by: Kai Huang <ahkcs@amazon.com>
---------
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Mirrors the trailing portion of #5414 onto feature/mustang-ppl-integration. The Gradle 9.4.1 wrapper bump (#5406) and the integTest exclude block for @ignore'd classes (#5407) already landed on this branch, but the matching cleanup of `CalciteNoPushdownIT` did not — the @suite still listed four classes that are now build-time excluded from integTest: - CalciteInformationSchemaCommandIT - CalciteJsonFunctionsIT - CalcitePrometheusDataSourceCommandsIT - CalciteShowDataSourcesCommandIT Leaving them in the suite means the JUnit runner attempts to load each class, finds it referenced from the suite, and registers a descriptor — which (combined with the class-level @ignore) reproduces the same TestEventReporterAsListener.started cast that the integTest exclude was meant to avoid. Dropping the entries here keeps the no-pushdown suite consistent with the exclude block: a class excluded at the build layer must not also appear in any @Suite.SuiteClasses list. No new tests are introduced or removed — the four classes are already @ignore'd and have been since they were written. This change is purely about keeping the no-pushdown suite metadata aligned with the exclusion pattern carried over from main. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ne route (#5415) The analytics-engine route and the v2 / Lucene path return columns in different orders when there is no explicit `| fields ...` clause: parquet preserves the storage order chosen by the on-disk format, while the Lucene path preserves `_source` iteration order. Both are valid given the contract `verifySchema` declares (set equality on column names), so positional `verifyDataRows` assertions over-constrain the test and fail under `-Dtests.analytics.force_routing=true` even when the data is correct. Apply the same column-name-keyed match pattern Kai introduced for `CalcitePPLRenameIT` in 59c728b (#5413): * Add `rowOf(key1, val1, ...)` to build column-keyed expected rows. * Add `verifyDataRowsByColumn(...)` to look up each cell value by column name and reorder to match the response schema before delegating to the existing positional `verifyDataRows` matcher. * Convert the four order-sensitive tests (`testMultipleReplace`, `testEmptyStringReplacement`, `testMultipleFieldsInClause`, `testMultiplePairsInSingleCommand`) to the new helpers. * Make `testReplaceNonExistentField` order-agnostic on the `input fields are: [...]` field list — assert that the prefix and every expected field name appear in the message, but not in a fixed order. Test results against analytics-engine route via `-Dtests.analytics.{force_routing,parquet_indices}=true`: 21/21 pass in both the direct `CalciteReplaceCommandIT` suite and the `CalciteNoPushdownIT > CalciteReplaceCommandIT` re-run. v2 path remains green. Companion to the OpenSearch PR onboarding PPL `replace` command + `replace()` / `regexp_replace()` functions on the analytics-engine route via DataFusion `replace` / `regexp_replace` UDFs. Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…th (#5417) Same shape as #5407 for CalciteEvalCommandIT. The IT's init() previously created the in-test test_eval index via direct `PUT /test_eval/_doc/N` requests, relying on dynamic mapping. Two problems: 1. The doc PUTs auto-create the index with whatever settings the cluster defaults to. The analytics-engine compatibility path (force-routing on; tests.analytics.parquet_indices=true) needs parquet-backed indices, which TestUtils.createIndexByRestClient applies via TestUtils.makeParquetBacked when the system property is set. Direct PUTs sidestep that helper, so test_eval lands as Lucene-backed and the analytics planner rejects it with "No backend can scan all requested fields on index [test_eval]". All four working tests fail at execution. 2. init() runs before every @test method. The doc PUTs are doc-level idempotent, so re-running was wasteful but not failing. Once we switch to createIndexByRestClient, the index-level PUT is no longer idempotent and re-running throws "resource_already_exists_exception". Both addressed in one change: - test_eval is created via TestUtils.createIndexByRestClient with an explicit mapping (name/title=keyword, age=long). The helper honours tests.analytics.parquet_indices=true and produces a parquet-backed index for the analytics-engine sweep; on the v2 path the helper is a no-op around the index PUT, so behaviour is unchanged. - The whole init body is guarded by TestUtils.isIndexExist — same idempotency idiom that loadIndex uses for predefined fixtures. First @test method provisions; subsequent methods skip. Also pins the projection order on testFieldFormatStringConcatenation. The original query (`source=test_eval | fieldformat greeting = 'Hello ' + name`) had no `| fields` clause and relied on the implicit projection's column order — v2 returns Lucene-source insertion order, analytics returns parquet-storage order (alphabetical), so the assertion only matched on v2 by coincidence. Adding `| fields name, title, age, greeting` makes the assertion deterministic across paths; the existing expected rows (`rows("Alice", "Engineer", 25, "Hello Alice")`) already match this order, so v2 behaviour is preserved. The other four tests already had explicit `| fields ...` clauses, so no change there. No semantic change for the v2 path: the explicit mapping types (keyword, long, keyword) resolve to the same PPL types ("string", "bigint", "string") that dynamic mapping inferred, and fieldformat reads from _source either way. Analytics-route compatibility goes from 1/5 to 4/5 (verified locally against a runTask cluster with analytics-engine + opensearch-sql-plugin). The remaining `testFieldFormatStringConcatenationWithNullFieldToString` needs a `tostring()` UDF on the analytics path — a multi-mode UDF (binary / hex / commas / duration) tracked separately as out of scope. Test plan: - ./gradlew :integ-test:integTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 5/5 green (v2 path, no regression). - ./gradlew :integ-test:analyticsCompatibilityTest --tests 'org.opensearch.sql.calcite.remote.CalciteFieldFormatCommandIT' -> 4/5 pass; the 5th fails on `tostring`'s missing capability registration, which is the documented out-of-scope category. Signed-off-by: Kai Huang <ahkcs@amazon.com>
sql plugin previously declared analytics-engine as a hard dependency:
extendedPlugins = ['opensearch-job-scheduler', 'analytics-engine']
Installing opensearch-sql on a distribution that doesn't ship
analytics-engine failed with:
Missing plugin [analytics-engine], dependency of [opensearch-sql]
Marking the dep ;optional=true alone is not enough — TransportPPLQueryAction
Guice-injects QueryPlanExecutor on its constructor, and Guice's OpenSearch
fork rejects a required constructor parameter whose binding is missing at
injector-build time ("constructors cannot be optional").
Move QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter. Guice invokes the setter only when a binding
for QueryPlanExecutor<RelNode, Iterable<Object[]>> exists — i.e. when
analytics-engine's createGuiceModules has run and bound DefaultPlanExecutor.
Absent analytics-engine, the setter is silently skipped,
unifiedQueryHandler stays null, and all PPL queries route to the v2
Calcite-to-OpenSearch path already in the sql plugin.
Drop the bundlePlugin exclude list. OpenSearch's jar-hell check skips the
extended-plugin cross-check when the dep is marked optional
(PluginsService.java:763), so sql can bundle every jar it needs to run
self-sufficiently. When analytics-engine is installed, parent-first
classloader delegation still lets analytics-engine's copies win for any
shared class; sql's bundled copies sit idle.
Promote analytics-framework.jar from compileOnly to api so
QueryPlanExecutor is reachable from sql's own classloader when the plugin
is absent. analytics-engine.jar stays compileOnly (required only for
OpenSearchSchemaBuilder, which lives in the engine plugin and is reached
only through RestUnifiedQueryAction — a class that never loads when the
setter is skipped).
Validated on a live 2-node cluster in both configurations:
- With analytics-engine installed: legacy and analytics PPL both return
expected rows; routing to the analytics path still fires for
parquet_-prefixed indices.
- Without analytics-engine (only opensearch-job-scheduler + opensearch-sql
installed): cluster starts cleanly, PPL and SQL queries against Lucene
indices return expected rows, parquet_-prefixed lookups return a clean
IndexNotFoundException instead of a NullPointerException or
NoClassDefFoundError.
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
…5397) Single squashed commit on top of feature/mustang-ppl-integration that absorbs upstream/main's commits not yet on the feature branch. Replaces the prior catch-up squash (#5396 base + the original af831d3 rebase commit) so this PR is a fast-forward into feature/mustang-ppl-integration. Squashed (rather than a merge commit) because upstream main commits were authored by many contributors with inconsistent or missing Signed-off-by trailers; DCO would otherwise reject those commits. Main commits absorbed (54 since divergence; 4 since the original catch-up squash was made on 2026-04-30): - #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec) - #5408 (datetime type normalization) - #5414 (Gradle wrapper bump + @ignore exclusion) - #5399 (FGAC-scoped SQL cursor continuation) - #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL language spec), #5240 (PPL Union), and 46 others. Conflict resolutions: api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec, UnifiedSqlSpec): took main. Main is a strict superset — adds postAnalysisRules and preCompilationRules extension points, the new FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/ date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension wiring on SQL spec. RELEVANCE category is preserved unchanged. api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java: took main. Both adopt the new postAnalysisRules / preCompilationRules hooks introduced in #5408 / #5419. core/executor/QueryService.java: composed both sides — kept feature's CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's StageErrorHandler stage tracking; both improvements are orthogonal. legacy/plugin/RestSqlAction.java: took feature. The 3-way merge produced a duplicated handleException/getRawErrorCode block; feature already contained both the delegateToV2Engine refactor and the ErrorReport unwrap from main, so feature is the correct superset. CLAUDE.md, docs/user/ppl/functions/condition.md: took main. explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359 shape). core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into PlanUtils.findInputCollation). integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT. ppl/antlr/OpenSearchPPLParser.g4: added unionCommand. ppl/calcite/CalcitePPLStreamstatsTest.java: added testMultipleStreamstatsWithWindow. integ-test/build.gradle: took feature. Both sides added the same @ignore exclusion block; feature has alphabetical ordering and a more detailed comment explaining the Gradle 9.4.1 cast bug. integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took feature's helper-managed test_eval provisioning (createIndexByRestClient + isIndexExist guard, from #5407) so analytics-engine compatibility runs get a parquet-backed index. Added the test_eval_agent setup (needed by the dotted-path eval tests for #5351) wrapped in its own isIndexExist guard for the same parquet-aware idempotency. plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java: took feature. PR #5403 made analytics-engine an optional dependency by moving QueryPlanExecutor from a required constructor parameter to an @Inject(optional=true) setter, and removed the loadExtensions / EngineExtensionsHolder / executionEngineExtensions plumbing. Feature retains the createSqlAnalyticsRouter method this catch-up introduced. plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced post-#5403; not present on feature). Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava + :integ-test compileTestJava all pass; unit tests pass; spotlessCheck clean. Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Default empty array() return type to ARRAY<VARCHAR> PPL's `array()` no-arg form delegates to Calcite's `SqlLibraryOperators.ARRAY` return-type inference, which returns ARRAY<NULL> for an empty operand list and ARRAY<UNKNOWN> when all operands are typeless nulls. Both markers are fine for the v2 engine — `ArrayImplementor.internalCast` only consumes the element type when there are elements to cast, so an empty result Object list flows straight through to ExprCollectionValue regardless of declared element type. The analytics-engine route is stricter. When isthmus walks a RexCall like `mvjoin(array(), '-')`, it reaches its first operand's type and feeds it to `io.substrait.isthmus.TypeConverter.toSubstrait`, which throws `UnsupportedOperationException: Unable to convert the type UNKNOWN`. Substrait has no on-wire encoding for NULL/UNKNOWN element types, so the planner can't serialize the call at all. Two PPL ITs hit this directly: * `CalciteArrayFunctionIT.testMvjoinWithEmptyArray` * `CalciteArrayFunctionIT.testMvdedupWithEmptyArray` Substituting VARCHAR when the inferred element type is NULL or UNKNOWN gives the call a substrait-serializable type without affecting any value computation: the result list is empty either way. # Test plan * Unit tests: `:core:test --tests "*ArrayFunction*"` — passes locally (no existing tests asserted on the empty-array element type). * IT: `CalciteArrayFunctionIT` force-routed through the analytics-engine path via opensearch-project/OpenSearch#21554's plugin set — testMvjoinWithEmptyArray and testMvdedupWithEmptyArray now pass (were UNKNOWN type errors); pass-rate moved 26/60 → 28/60. Companion to opensearch-project/OpenSearch#21554. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Add unit tests for empty/UNKNOWN ARRAY → VARCHAR fallback Cover the four shapes that exercise the return-type inference path introduced in 666dc0e: * array() — 0 operands, fallback fires → ARRAY<VARCHAR> * array(NULL) — typeless-null operand, fallback fires → ARRAY<VARCHAR> * array(1) — INTEGER operand, fallback does NOT fire → ARRAY<INTEGER> * array('a', 'b') — VARCHAR operands, fallback does NOT fire → ARRAY<VARCHAR> The third case is the regression guard requested by review — confirms concrete element types pass through unchanged and the fallback is scoped strictly to the {@code NULL}/{@code UNKNOWN} markers. The harness uses Calcite's {@link ExplicitOperatorBinding} bound to {@link SqlLibraryOperators#ARRAY} so the inference's internal {@code SqlLibraryOperators.ARRAY.getReturnTypeInference().inferReturnType(...)} call resolves the same operator the production code delegates to — mocking {@code SqlOperatorBinding} directly hits NPEs deep inside Calcite's least-restrictive-type computation. Addresses review feedback on #5421. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Trim inline comment per review feedback The "why" lives in the PR description; the inline comment now points there instead of duplicating it. Addresses dai-chen's review note on Signed-off-by: Kai Huang <ahkcs@amazon.com> #5421. --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
…5423) * [Calcite] Make containsNestedAggregator tolerate flat-leaf schemas The check splits each aggregator argument on '.' and looks up the leading token as a top-level ARRAY column (the OpenSearch `nested` marker). It used relBuilder.field(root), which throws when the column doesn't exist. That works for the classic path, which always exposes a top-level column for object/nested parents. It breaks the analytics-engine path, which emits only flat leaves ("city.name", "city.location.latitude") — parent placeholders can't round-trip through Substrait. Any aggregation against an object-field leaf (e.g. stats max(city.location.latitude)) crashed with "Field [city] not found". Use RelDataType.getField, which returns null on miss. A null lookup is the correct "not nested" answer. Behavior unchanged for relations that do have a top-level ARRAY column. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix spotless Signed-off-by: Marc Handalian <marc.handalian@gmail.com> --------- Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Contributor
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit f006e29. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
6 tasks
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Merges the long-running
feature/mustang-ppl-integrationbranch intomain, delivering the analytics-engine PPL integration end-to-end. This is the consolidation PR for 22 feature commits accumulated since the branch's last sync.Note
This PR currently shows merge conflicts against
main. The catch-up PR #5430 bringsfeature/mustang-ppl-integrationup to currentmainand resolves all seven conflicts; once #5430 merges, this PR should be conflict-free and ready for review/merge.What this delivers
Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces:
EXPLAINcovers the analytics-engine path so users see a consistent plan view across both enginesextendedPluginswiring (Wire analytics-engine as extendedPlugins dependency #5302) — analytics-engine attaches as an OpenSearch extension via SPIdelegateToV2Engineextraction inRestSqlActionQueryPlanExecutor(Version bump to OpenSearch 3.7 with async QueryPlanExecutor #5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7Plus supporting infrastructure:
arrow-flight-rpc/httpcore5-h2/httpcore5-reactive/httpclient5(Exclude httpcore5-h2, httpcore5-reactive, httpclient5 from SQL bundle to fix jar hell #5400, Exclude jsr305 from SQL bundle to fix jar hell with arrow-flight-rpc #5409)CalciteEvalCommandIT/CalciteFieldFormatCommandITcarried through the helper-managed index path (Carry CalciteEvalCommandIT through the helper-managed index path #5407, Carry CalciteFieldFormatCommandIT through the helper-managed index path #5417);CalciteReplaceCommandITcolumn-order-agnostic (Make CalciteReplaceCommandIT column-order-agnostic for analytics-engine route #5415);@Ignore'd Calcite ITs dropped fromCalciteNoPushdownIT(Drop @Ignore'd Calcite ITs from CalciteNoPushdownIT @Suite #5416)plugins.calcite.enabled=truedefaulted on the unified query path (Default plugins.calcite.enabled=true on the unified query path #5413)PPL_REX_MAX_MATCH_LIMITbridged intoUnifiedQueryContext(Bridge PPL_REX_MAX_MATCH_LIMIT into UnifiedQueryContext on the unified query path #5418)array()default type (Default empty array() return type to ARRAY<VARCHAR> #5421),containsNestedAggregatorflat-leaf schemas ([Calcite] Make containsNestedAggregator tolerate flat-leaf schemas #5423)analytics-apiJDK 21 surface (Switch sandbox deps to analytics-api (JDK 21 surface) #5426)Full commit list (22)
analytics-api(JDK 21 surface)containsNestedAggregatortolerate flat-leaf schemasarray()return type toARRAY<VARCHAR>PPL_REX_MAX_MATCH_LIMITintoUnifiedQueryContexton the unified query pathCalciteFieldFormatCommandITthrough the helper-managed index pathCalciteReplaceCommandITcolumn-order-agnostic for analytics-engine route@Ignore'd Calcite ITs fromCalciteNoPushdownIT@Suiteplugins.calcite.enabled=trueon the unified query pathCalciteEvalCommandITthrough the helper-managed index pathjsr305from SQL bundle to fix jar hell witharrow-flight-rpchttpcore5-h2,httpcore5-reactive,httpclient5from SQL bundle to fix jar hellQueryPlanExecutorextendedPluginsdependencyUnifiedQueryParserfeature/mustang-ppl-integrationwith currentupstream/mainmainintofeature/mustang-ppl-integrationCompatibility / opt-in
The analytics-engine path is gated by the
extendedPluginsextension being installed (#5403 makes the dep optional). For clusters without analytics-engine installed, all PPL/SQL queries continue through the existing v2 / Calcite paths — no behavior change. For clusters with analytics-engine installed, only Parquet-backed indices route through the new path (#5429 — by index setting).Test plan
:api/:core/:legacy/:opensearch-sql-plugin/:integ-test:compileTestJavapass locally after mergingmaintests.analytics.parquet_indices=trueshows no regressionsBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.