Skip to content

Land analytics-engine PPL integration into main#5430

Merged
ahkcs merged 2 commits into
opensearch-project:mainfrom
ahkcs:pr/catch-up-mustang-ppl-integration
May 11, 2026
Merged

Land analytics-engine PPL integration into main#5430
ahkcs merged 2 commits into
opensearch-project:mainfrom
ahkcs:pr/catch-up-mustang-ppl-integration

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented May 11, 2026

Summary

Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main.

Why squashed (DCO)

Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes in #5397. Specifically:

Tried a real two-parent merge commit first; DCO flagged 29 commits. Squash is the only practical fix given the upstream history. Feature-branch commit lineage is retained on feature/mustang-ppl-integration (f006e29cd).

What this delivers

Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:

Supporting infrastructure:

Feature-branch commits squashed (22)

#5429, #5426, #5423, #5421, #5418, #5403, #5417, #5415, #5416, #5413, #5407, #5409, #5406, #5400, #5396, #5317, #5302, #5285, #5275, #5267, #5397, #5286

Main commits absorbed via merge (54)

Brings the branch up to current upstream/main (54 commits since the last catch-up at #5397, divergence point 513e1b220). Highlights: #5419, #5408, #5414, #5399, #5394, #5361, #5360, #5240, #5266, #5278, plus 44 others.

Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them.

File Resolution
api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's formatting (no extra blank).
core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import.
integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT/JoinIT and SQLFunctionsIT/ShowIT/SourceFieldIT entries.
integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup.
legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...). Both sides added handleException/getRestStatus/getRawErrorCode; removed the duplicate set git produced.
plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType.
plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings/pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (#5298) preserved as-is.

Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension being installed (#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (#5429 — by index setting).

Merge mode

Either "Create a merge commit" or "Rebase and merge" works — the PR is one commit so the outcome is equivalent.

Test plan

  • :api / :core / :legacy / :opensearch-sql-plugin compileJava pass locally
  • :integ-test:compileTestJava passes locally
  • DCO sign-off present on the single squashed commit
  • CI build passes
  • Integration tests on the analytics-engine path still pass
  • Compatibility run on tests.analytics.parquet_indices=true shows no regressions

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Reviewer Guide 🔍

(Review updated until commit f1bc872)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Unguarded Field Access

executionEngineExtensions is initialized as an empty list at line 154 but reassigned in loadExtensions (line 176). If loadExtensions is never called or called after createComponents, the field may be in an inconsistent state. The plugin lifecycle does not guarantee loadExtensions runs before createComponents, so createComponents at line 371 may pass an empty list to OpenSearchPluginModule even when extensions exist.

private List<ExecutionEngine> executionEngineExtensions = List.of();
private ClusterService clusterService;

/** Settings should be inited when bootstrap the plugin. */
private org.opensearch.sql.common.setting.Settings pluginSettings;

private NodeClient client;
private DataSourceServiceImpl dataSourceService;
private OpenSearchAsyncQueryScheduler asyncQueryScheduler;
private Injector injector;

public String name() {
  return "sql";
}

public String description() {
  return "Use sql to query OpenSearch.";
}

@Override
public void loadExtensions(ExtensionLoader loader) {
  List<ExecutionEngine> loaded = loader.loadExtensions(ExecutionEngine.class);
  this.executionEngineExtensions = loaded != null ? List.copyOf(loaded) : List.of();

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Code Suggestions ✨

Latest suggestions up to f1bc872
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Filter null extensions from loaded list

The loadExtensions method may return a list containing null elements. Filter out
null entries before copying to prevent NullPointerException when iterating over
executionEngineExtensions later.

plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java [175-176]

 List<ExecutionEngine> loaded = loader.loadExtensions(ExecutionEngine.class);
-this.executionEngineExtensions = loaded != null ? List.copyOf(loaded) : List.of();
+this.executionEngineExtensions = loaded != null 
+    ? loaded.stream().filter(Objects::nonNull).toList() 
+    : List.of();
Suggestion importance[1-10]: 7

__

Why: Valid defensive programming suggestion. Filtering null elements prevents potential NullPointerException when iterating over executionEngineExtensions. The concern about loadExtensions returning nulls is reasonable for external plugin loading mechanisms.

Medium
Filter null engines in record constructor

The compact constructor should filter out null elements from the input list to
prevent potential NullPointerException when consumers iterate over the engines. This
ensures defensive copying with validation.

plugin/src/main/java/org/opensearch/sql/plugin/config/EngineExtensionsHolder.java [17-18]

 public record EngineExtensionsHolder(List<ExecutionEngine> engines) {
   public EngineExtensionsHolder(List<ExecutionEngine> engines) {
-    this.engines = engines != null ? List.copyOf(engines) : List.of();
+    this.engines = engines != null 
+        ? engines.stream().filter(Objects::nonNull).toList() 
+        : List.of();
   }
 }
Suggestion importance[1-10]: 7

__

Why: Valid defensive programming suggestion. Filtering null elements in the constructor prevents potential issues when consumers iterate over the engines list. This is consistent with the filtering suggested for SQLPlugin.loadExtensions.

Medium
General
Make extensions field immutable and final

The field executionEngineExtensions should be declared as final and initialized in
the constructor or via loadExtensions. Currently, it's mutable and initialized to an
empty list, which could lead to race conditions if accessed before loadExtensions is
called.

plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java [154]

-private List<ExecutionEngine> executionEngineExtensions = List.of();
+private final List<ExecutionEngine> executionEngineExtensions;
Suggestion importance[1-10]: 3

__

Why: While making the field final would improve immutability, the current initialization to List.of() provides a safe default. The suggestion about race conditions is overstated since loadExtensions is called during plugin initialization before concurrent access occurs.

Low

Previous suggestions

Suggestions up to commit 6076c1b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Filter null elements from loaded extensions

The loadExtensions method may return a list containing null elements. Filter out
null entries before copying to prevent potential NullPointerException when iterating
over executionEngineExtensions later.

plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java [175-176]

 List<ExecutionEngine> loaded = loader.loadExtensions(ExecutionEngine.class);
-this.executionEngineExtensions = loaded != null ? List.copyOf(loaded) : List.of();
+this.executionEngineExtensions = loaded != null 
+    ? loaded.stream().filter(Objects::nonNull).toList() 
+    : List.of();
Suggestion importance[1-10]: 7

__

Why: Valid concern about potential null elements in the loaded list. Filtering nulls prevents NullPointerException when iterating over executionEngineExtensions, improving robustness.

Medium
Ensure thread-safe visibility for field

Mark executionEngineExtensions as volatile to ensure thread-safe visibility when
accessed from multiple threads, since it's modified in loadExtensions and later read
in createComponents.

plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java [154]

-private List<ExecutionEngine> executionEngineExtensions = List.of();
+private volatile List<ExecutionEngine> executionEngineExtensions = List.of();
Suggestion importance[1-10]: 6

__

Why: The executionEngineExtensions field is written in loadExtensions and read in createComponents. Adding volatile ensures proper visibility across threads in a multi-threaded plugin environment, though the actual risk depends on OpenSearch's plugin lifecycle guarantees.

Low

@ahkcs ahkcs force-pushed the pr/catch-up-mustang-ppl-integration branch from 6076c1b to f1bc872 Compare May 11, 2026 16:59
@ahkcs ahkcs changed the title Catch up feature/mustang-ppl-integration with upstream/main Merge upstream/main into feature/mustang-ppl-integration May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f1bc872

@ahkcs ahkcs changed the title Merge upstream/main into feature/mustang-ppl-integration Land analytics-engine PPL integration into main May 11, 2026
@ahkcs ahkcs changed the base branch from feature/mustang-ppl-integration to main May 11, 2026 17:12
@ahkcs ahkcs added maintenance Improves code quality, but not the product infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. labels May 11, 2026
@ahkcs ahkcs closed this May 11, 2026
@ahkcs ahkcs reopened this May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 940798c.

PathLineSeverityDescription
core/build.gradle36highmavenLocal() added as a dependency repository. This resolves artifacts from the local Maven cache (~/.m2) before mavenCentral, enabling dependency-confusion and poisoned-cache attacks where a locally injected artifact shadows the legitimate one.
core/build.gradle67highNew dependency on 'org.opensearch.sandbox:analytics-api:3.7.0-SNAPSHOT' (both compileOnly and testImplementation). SNAPSHOT artifacts are mutable and can be silently replaced between builds. The 'sandbox' group namespace has no verified ownership in the public registry and is susceptible to namespace hijacking.
plugin/build.gradle57highextendedPlugins list modified to include 'analytics-engine;optional=true'. This declares a dependency on an externally-supplied plugin whose classloader has full access to the SQL plugin's classes. Mandatory flag: build plugin / compiler extension change.
plugin/build.gradle162highAdded runtime dependency 'org.opensearch.sandbox:analytics-api' resolved at ${opensearch_version}. The 'sandbox' group is unverified in public registries and the version is dynamically interpolated, making artifact pinning impossible without an accompanying hash check.
integ-test/build.gradle274highIntegration test cluster downloads binary plugin ZIPs from 'https://ci.opensearch.org/ci/dbc/feature-build-opensearch/feature-datafusion/latest/linux/x64/...' using a floating 'latest' path with no checksum verification. An attacker who controls that CI path could serve a backdoored plugin that is then installed into the test OpenSearch cluster.
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteClassLoaderHelper.java38mediumThread context classloader is swapped to the caller's classloader around every Calcite execution. While documented as a CALCITE-3745 workaround, this pattern permanently widens what classes Janino (a code-generation runtime) can resolve and deserves review to confirm no unintended class-load paths are opened.
plugin/src/main/java/org/opensearch/sql/plugin/rest/AnalyticsExecutorHolder.java27mediumGlobal static volatile field holds a QueryPlanExecutor set by whichever plugin satisfies the Guice binding. If analytics-engine is replaced with a malicious artifact (see supply-chain issues above), this holder silently routes all matching queries through attacker-controlled execution logic without any integrity check.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java160lowQuery routing to the analytics engine is decided by reading mutable index settings (PLUGGABLE_DATAFORMAT_ENABLED + value == 'composite'). If an attacker can write cluster/index settings they could redirect arbitrary SQL/PPL queries to the analytics execution path, bypassing the normal SQL engine's authorization and auditing stack.

The table above displays the top 10 most important findings.

Total: 8 | Critical: 0 | High: 5 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@ahkcs ahkcs force-pushed the pr/catch-up-mustang-ppl-integration branch from f1bc872 to b0b6c12 Compare May 11, 2026 17:18
Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

### Feature-branch commits squashed (22)

opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
@ahkcs ahkcs force-pushed the pr/catch-up-mustang-ppl-integration branch from 2f1f0ec to 50eb98a Compare May 11, 2026 18:11
penghuo
penghuo previously approved these changes May 11, 2026
Comment thread doctest/build.gradle Outdated

apply plugin: 'opensearch.testclusters'


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. revert it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

RyanL1997
RyanL1997 previously approved these changes May 11, 2026
After 'apply plugin: opensearch.testclusters', one blank line is enough
— restoring the single-blank spacing to match upstream/main.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs dismissed stale reviews from RyanL1997 and penghuo via 940798c May 11, 2026 18:33
@ahkcs ahkcs added skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels May 11, 2026
@bowenlan-amzn
Copy link
Copy Markdown
Member

When using the analytics engine, sql will use a patched version of Calcite published from opensearch core side.
Leaving this here for broader awareness.
opensearch-project/OpenSearch#21501

@ahkcs ahkcs merged commit f1f39ef into opensearch-project:main May 11, 2026
79 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. maintenance Improves code quality, but not the product skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants