Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).
The feature branch f006e29 is retained for individual-commit lineage.
### What this delivers
Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.
Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
hand off to the analytics-engine execution path; Lucene-backed indices
continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
applied to the SQL transport, plus delegateToV2Engine extraction in
RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429) — replaces the earlier
table-name-prefix heuristic with an authoritative index-setting check
Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
(opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)
### Feature-branch commits squashed (22)
opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286
### Main commits absorbed via the merge (54)
Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).
### Conflict resolutions (7)
Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.
- api/.../UnifiedQueryContext.java
Blank-line-only conflict; took main's tighter formatting.
- core/.../executor/QueryService.java
Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
wrapping (required for analytics-engine classloader isolation) and
the matching import.
- integ-test/build.gradle
Kept feature's detailed root-cause comment on the Gradle 9.4.1
TestEventReporterAsListener workaround; kept ASCII ordering of
JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
entries.
- integ-test/.../CalciteEvalCommandIT.java
Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
on test_eval and test_eval_agent setup (needed for the helper-managed
index analytics-engine compatibility run).
- legacy/.../RestSqlAction.java
Kept feature's delegateToV2Engine(...) (extracted from the
analytics-engine routing path). Both sides added handleException /
getRestStatus / getRawErrorCode; removed the duplicate set git
produced.
- plugin/.../SQLPlugin.java
Took the union of imports: ExecutionEngine +
ExecutionEngine.ExplainResponse + QueryType.
- plugin/.../transport/TransportPPLQueryAction.java
Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
feature's local pluginSettings / pluginSettingsRef wiring.
EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.
### Compatibility / opt-in
The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).
### Verification
- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
:opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
locally
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Description
Today `RestUnifiedQueryAction.isAnalyticsIndex` routes queries to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type: an index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path, and an index named `parquet_foo` without the setting is mis-dispatched.
Switch to the authoritative signal: the `index.pluggable.dataformat.enabled` cluster-state setting. This is the same flag integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`) use to create analytics-backed indices, and what `FieldStorageResolver` reads to resolve field storage.
Routing behavior
Issues Resolved
Mustang rollout pre-work — aligns PPL/SQL routing with the index-setting-based model already used elsewhere in the engine.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.