Skip to content

Switch sandbox deps to analytics-api (JDK 21 surface)#5426

Merged
ahkcs merged 3 commits into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:fix/sandbox-variant-and-transitive
May 11, 2026
Merged

Switch sandbox deps to analytics-api (JDK 21 surface)#5426
ahkcs merged 3 commits into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:fix/sandbox-variant-and-transitive

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn commented May 8, 2026

Summary

sql depends on two interfaces from analytics-engine: QueryPlanExecutor and OpenSearchSchemaBuilder. Previously these were vendored as local JARs. This PR:

  1. Replaces vendored JARs with analytics-api — a new JDK-21-targeted Maven module that exposes only those two interfaces. Depends on Extract analytics-api module for JDK 21 downstream consumers OpenSearch#21574.
  2. Removes analytics-engine from standard test clusters — it was never needed for sql's own integration tests. Standard integTest runs without it.
  3. Adds analyticsEngineCompatIT — a dedicated smoke test that boots sql alongside analytics-engine + arrow-flight-rpc to verify the three plugins load together without classloader errors.

Root cause of the original JDK mismatch

analytics-framework and analytics-engine target JDK 25 (they use java.lang.foreign). sql targets JDK 21. Consuming their JARs directly caused class file has wrong version 69.0, should be 65.0 on the JDK-21 CI matrix. The fix is analytics-api: a thin module containing only the interfaces sql needs, compiled at JDK 21.

Why analytics-api is implementation (not compileOnly) in plugin

OpenSearch scans plugin classes at startup by calling getMethods(). sql's handler classes reference QueryPlanExecutor in their method signatures. Without analytics-api bundled in the plugin zip, OpenSearch throws NoClassDefFoundError: QueryPlanExecutor at startup even when no analytics-engine code path runs. When analytics-engine IS installed, its classloader provides the class via extendedPlugins parent-first delegation — the bundled copy is unused but harmless.

Verification

Tested locally with analytics-api published to mavenLocal from the OpenSearch PR branch:

  • ./gradlew :core:compileJava — BUILD SUCCESSFUL
  • ./gradlew :opensearch-sql-plugin:bundlePlugin — BUILD SUCCESSFUL
  • ./gradlew :integ-test:analyticsEngineCompatIT — BUILD SUCCESSFUL (cluster boots, all three plugins load)
  • CI will turn green once OpenSearch#21574 merges and analytics-api-3.7.0-SNAPSHOT is published to the snapshot Maven repo.

Depends on

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit de41ce1.

PathLineSeverityDescription
core/build.gradle67highDependency change: local JAR files replaced by external Maven artifact 'org.opensearch.sandbox:analytics-api:3.7.0-SNAPSHOT'. The 'sandbox' namespace is less controlled than the primary 'org.opensearch' namespace and artifact authenticity cannot be verified without reviewing the registry configuration. Mandatory flag for supply chain review.
plugin/build.gradle163highDependency change: local JARs replaced by external Maven artifact 'org.opensearch.sandbox:analytics-api:${opensearch_version}'. Uses a dynamic version variable rather than a pinned version string, making the resolved artifact non-deterministic. Mandatory flag for supply chain review.
integ-test/build.gradle279highNew Download tasks fetch binary plugin ZIPs from an external CI URL containing '/latest/' in the path (https://ci.opensearch.org/ci/dbc/feature-build-opensearch/feature-datafusion/latest/...). No checksum or integrity verification is performed. The 'latest' segment means content changes silently over time, allowing a compromised upstream build to inject malicious binaries into test clusters.
.github/workflows/analytics-engine-compat.yml17highExternal reusable workflow referenced at @main (not a pinned SHA): 'opensearch-project/opensearch-build/.github/workflows/get-ci-image-tag.yml@main'. Outputs from this workflow (ci-image-version-linux, ci-image-start-options, ci-image-start-command) are injected directly into the container image selector and a run: step (line 31), meaning a compromise of that external repo's main branch could execute arbitrary commands in CI.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 4 | Medium: 0 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Comment thread build.gradle Outdated
Comment thread build.gradle Outdated
@bowenlan-amzn bowenlan-amzn force-pushed the fix/sandbox-variant-and-transitive branch from 4deb644 to 64ffed9 Compare May 8, 2026 22:34
@bowenlan-amzn bowenlan-amzn changed the title Unblock sandbox Maven consumption: variant override + transitive=false Switch sandbox deps to analytics-api (JDK 21 surface) May 9, 2026
@bowenlan-amzn bowenlan-amzn force-pushed the fix/sandbox-variant-and-transitive branch 3 times, most recently from e2f6796 to 60cb52c Compare May 9, 2026 03:35
@bowenlan-amzn bowenlan-amzn marked this pull request as ready for review May 9, 2026 03:39
@bowenlan-amzn bowenlan-amzn force-pushed the fix/sandbox-variant-and-transitive branch from c8248a4 to d769c12 Compare May 9, 2026 03:54
Replace the three vendored binaries under libs/ with build-time downloads:

- analytics-framework.jar / analytics-engine.jar: maven coordinates against
  https://ci.opensearch.org/ci/dbc/snapshots/maven (already declared as a
  subproject repo) → resolved as `org.opensearch.sandbox:analytics-{framework,
  engine}:3.7.0-SNAPSHOT`.
- analytics-engine.zip: the snapshot maven repo doesn't publish the plugin
  distribution zip, so we fetch it from the feature-build URL via a new root
  task `:downloadAnalyticsEngineZip` (de.undercouch.download — same plugin
  the integ-test and doctest projects already use). Output lands in
  ${rootProject.buildDir}/distributions/analytics-engine-3.7.0-SNAPSHOT.zip
  and the existing `analyticsEngineZip` ext property points at it. The
  -PanalyticsEngineZip=/path override still works and skips the download.

Wiring in the three subprojects that consume the zip (plugin, integ-test,
doctest) ensures every test-cluster-bearing task (RestIntegTestTask,
StandaloneRestIntegTestTask, RunTask) dependsOn the download task before
the cluster boots.

mavenLocal() now excludes the org.opensearch.sandbox group: a locally-
published OpenSearch core build can install these artifacts with Gradle
Module Metadata declaring a JVM version different from the published
snapshot, which can shadow the remote and break resolution. Always pull
sandbox artifacts from the CI snapshot repo.

Why now:
- libs/ adds 17 MB of binaries to every clone.
- The vendored zip drifts every time analytics-engine bumps; remote URL
  always points at the latest feature-build artifact.
- Aligns with the OpenSearch convention of consuming sandbox artifacts
  via the published snapshot repo (same as opensearch-core, opensearch-
  job-scheduler-spi, etc., already wired this way).

NOT YET COMPILABLE: the currently-published analytics-framework snapshot
(3.7.0-20260409.054801-56) is older than what was vendored in libs/ and
exposes a sync `Stream execute(LogicalPlan, Object)` instead of the async
`void execute(LogicalPlan, Object, ActionListener<Stream>)` that SQL's
AnalyticsExecutionEngine targets. Wait for upstream to publish a fresh
sandbox snapshot before merging.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn force-pushed the fix/sandbox-variant-and-transitive branch from d769c12 to ef92d85 Compare May 9, 2026 03:59
Without this, RestIntegTestTask subclasses in integ-test don't have
opensearch-agent.jar on their JVM classpath, causing AgentPolicy class
initialization to fail at test startup.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn force-pushed the fix/sandbox-variant-and-transitive branch from ef92d85 to de41ce1 Compare May 9, 2026 04:09
@bowenlan-amzn bowenlan-amzn added the enhancement New feature or request label May 9, 2026
@ahkcs ahkcs merged commit 0d5e553 into opensearch-project:feature/mustang-ppl-integration May 11, 2026
37 of 41 checks passed
ahkcs added a commit to ahkcs/sql that referenced this pull request May 11, 2026
Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

### Feature-branch commits squashed (22)

opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants