Skip to content

Resolve analytics-engine/framework from remote, drop vendored libs/#5425

Open
ahkcs wants to merge 1 commit into
opensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:chore/libs-to-remote-deps
Open

Resolve analytics-engine/framework from remote, drop vendored libs/#5425
ahkcs wants to merge 1 commit into
opensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:chore/libs-to-remote-deps

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented May 8, 2026

Summary

Replaces three vendored binaries under libs/ with build-time downloads from the OpenSearch CI snapshot repo (jars) and feature-build artifact URL (zips). No source compatibility changes; surgical Gradle features handle the JVM-version and Calcite-version skew between sql (JVM 21 / vanilla Calcite) and the sandbox publish (JVM 25 / patched Calcite).

Vendored file (removed) Replacement
libs/analytics-framework-3.7.0-SNAPSHOT.jar maven coord org.opensearch.sandbox:analytics-framework:3.7.0-SNAPSHOT (CI snapshot repo, already declared)
libs/analytics-engine-3.7.0-SNAPSHOT.jar maven coord org.opensearch.sandbox:analytics-engine:3.7.0-SNAPSHOT
libs/analytics-engine-3.7.0-SNAPSHOT.zip new root task :downloadAnalyticsEngineZip (de.undercouch.download) → ${rootProject.buildDir}/distributions/analytics-engine-3.7.0-SNAPSHOT.zip
(newly required) arrow-flight-rpc.zip new root task :downloadArrowFlightRpcZip${rootProject.buildDir}/distributions/arrow-flight-rpc-3.7.0-SNAPSHOT.zip

Existing analyticsEngineZip and new arrowFlightRpcZip ext properties default to the downloaded paths; -PanalyticsEngineZip=/path / -ParrowFlightRpcZip=/path overrides still work and skip the download. All RestIntegTestTask / StandaloneRestIntegTestTask / RunTask consumers dependsOn both download tasks.

Two Gradle blockers solved without a JDK bump

1. JVM variant mismatch — componentMetadataRules

AF/AE publish Gradle Module Metadata declaring org.gradle.jvm.version=25 (the sandbox uses FFM API, finalized in JDK 22). sql targets JVM 21, so Gradle's variant matcher would refuse them. A componentMetadataRules block in subprojects rewrites TARGET_JVM_VERSION to 21 for org.opensearch.sandbox:* only:

withModule('org.opensearch.sandbox:analytics-framework') { details ->
    details.allVariants {
        attributes {
            attribute(TargetJvmVersion.TARGET_JVM_VERSION_ATTRIBUTE, 21)
        }
    }
}

Safe because: sql code references only non-FFM interfaces of AF (QueryPlanExecutor, SchemaProvider, etc.); production runtime is JVM 25 nodes where the actual JDK 25 bytecode runs fine; rule is scoped to a two-module group — no effect on any other dependency.

2. Calcite version conflict — transitive = false

AF transitively pulls calcite-core:1.41.0-opensearch-1 (patched fork). sql declares vanilla calcite-core:1.41.0. transitive = false on the AF/AE dependency declarations cuts the patched calcite out of sql's compile classpath. At runtime, sql's classloader delegates to AE's via extendedPlugins = ['analytics-engine;optional=true'] parent-first delegation, so AE's bundled patched calcite wins; sql's vanilla copy sits idle.

Test cluster wiring: install arrow-flight-rpc first

The remote analytics-engine.zip declares arrow-flight-rpc as an extendedPlugins parent. Without this PR's wiring, the cluster install fails with:

java.lang.IllegalArgumentException: Missing plugin [arrow-flight-rpc], dependency of [analytics-engine]
    at org.opensearch.plugins.PluginsService.addSortedBundle(PluginsService.java:632)

plugin/build.gradle, integ-test/build.gradle and doctest/build.gradle now install arrow-flight-rpc immediately before analytics-engine in every test-cluster definition.

mavenLocal sandbox exclusion

mavenLocal {
    content { excludeGroup 'org.opensearch.sandbox' }
}

A locally-published OpenSearch core build can install these artifacts into ~/.m2/repository with Gradle Module Metadata declaring different attributes than the published snapshot, which can shadow the remote and break resolution. Sandbox artifacts always come from the CI snapshot repo.

Verified locally (JDK 25 / Temurin 25.0.1+8)

  • ./gradlew :core:compileJava
  • ./gradlew :opensearch-sql-plugin:compileJava
  • ./gradlew --continue build -x integTest -x yamlRestTest -x doctest — full unit-test build
  • ./gradlew :downloadAnalyticsEngineZip --rerun-tasks → 16.7 MB zip
  • ./gradlew :downloadArrowFlightRpcZip --rerun-tasks → 13.3 MB zip
  • -PanalyticsEngineZip=/local/path correctly skips the download
  • Full integration tests (gated on CI; arrow-flight-rpc install is the change)

Known limitation: JDK 21 CI legs will fail

The variant override gets past Gradle's resolver, but javac 21 still can't read JDK 25 bytecode:

class file has wrong version 69.0, should be 65.0

This is a hard JDK limit; no Gradle trick fixes it. The JDK 21 leg of CI will fail on :core:compileJava. Same limitation as #5426 (similar approach by @bowenlan-amzn). Resolves naturally when either:

  1. analytics-engine graduates from sandbox and goes back to JDK 21 bytecode publishing, or
  2. SQL plugin's MIN compat moves to JDK 25 ecosystem-wide (per Peter Zhu's Slack thread, that's the deferred conversation).

Iteration over perfection — landing the libs→remote migration now unblocks the day-to-day "vendored zip drifts every snapshot" pain even while the JDK 21 CI gap remains.

Diff scope

8 files, +140 / -11. No source/target compat bump, no Lombok bump, no calcite version forcing, no CI matrix changes.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 17d5014.

PathLineSeverityDescription
build.gradle84highNew Gradle build plugin added: 'de.undercouch.download' version '5.3.0'. Mandatory supply chain flag — plugin executes at build time with full filesystem and network access; artifact authenticity cannot be verified from the diff alone.
build.gradle121highAnalytics-engine plugin ZIP is fetched from a mutable 'latest' URL path (ci.opensearch.org/.../feature-datafusion/latest/...) with no hash or checksum verification. The resolved artifact can change silently between builds without any code change, enabling a compromised CI server or MITM to inject malicious binary plugins into the build environment.
build.gradle123highArrow-flight-rpc plugin ZIP is fetched from the same mutable 'latest' URL path with no integrity verification. Same risk as the analytics-engine ZIP: content is not pinned and can be silently swapped.
core/build.gradle67highNew Maven dependency introduced: 'org.opensearch.sandbox:analytics-framework:3.7.0-SNAPSHOT'. Mandatory supply chain flag — switches from a vendored local JAR to a remote artifact fetched from the CI snapshot repository. The 'sandbox' namespace and SNAPSHOT qualifier mean the artifact is mutable and not release-pinned.
plugin/build.gradle168highNew Maven dependencies introduced: 'org.opensearch.sandbox:analytics-framework:3.7.0-SNAPSHOT' (api scope) and 'org.opensearch.sandbox:analytics-engine:3.7.0-SNAPSHOT' (compileOnly scope). Mandatory supply chain flag — both are remote SNAPSHOT artifacts in the sandbox namespace, replacing locally vendored binaries. SNAPSHOT artifacts are mutable by definition.
build.gradle233mediumA ComponentMetadataRule overrides the published TargetJvmVersion attribute for 'org.opensearch.sandbox:analytics-framework' and 'org.opensearch.sandbox:analytics-engine' from 25 to 21. This bypasses Gradle's built-in JVM compatibility guard for these two artifacts, suppressing any warning if a future artifact incompatible with JVM 21 is published under the same coordinates.

The table above displays the top 10 most important findings.

Total: 6 | Critical: 0 | High: 5 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@ahkcs ahkcs changed the title Resolve analytics-engine/framework from remote, drop vendored libs/ Migrate analytics-engine deps to remote + bump SQL plugin to JDK 25 May 8, 2026
@ahkcs ahkcs marked this pull request as ready for review May 8, 2026 21:50
@ahkcs ahkcs added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label May 8, 2026
strategy:
fail-fast: false
matrix:
java: [21, 25]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have other bwc tests then?

Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaces three vendored binaries under libs/ with build-time downloads:

- analytics-framework / analytics-engine jars: maven coordinates against the
  OpenSearch CI snapshot repo (org.opensearch.sandbox:*). Already declared
  as a subproject repository.
- analytics-engine + arrow-flight-rpc plugin zips: the snapshot maven repo
  doesn't publish plugin distribution zips, so two new root tasks
  (:downloadAnalyticsEngineZip, :downloadArrowFlightRpcZip) fetch them from
  the feature-build artifact URL via de.undercouch.download (the same plugin
  integ-test/doctest already use). Output lands in
  ${rootProject.buildDir}/distributions/. The existing analyticsEngineZip
  ext property points at the downloaded path; -PanalyticsEngineZip=/path
  and -ParrowFlightRpcZip=/path overrides still work and skip the download.

Test cluster setup in plugin/, integ-test/, doctest/ now installs
arrow-flight-rpc *before* analytics-engine — the latter declares the former
as an extendedPlugins parent, so OpenSearch's plugin install fails with
"Missing plugin [arrow-flight-rpc], dependency of [analytics-engine]"
otherwise. All RestIntegTestTask / StandaloneRestIntegTestTask / RunTask
consumers dependsOn both download tasks.

Two Gradle-side blockers handled to make remote consumption work without a
JDK source/target bump:

1. JVM variant mismatch
   AF/AE publish Gradle Module Metadata declaring jvm.version=25 because
   the sandbox uses the FFM API (finalized in JDK 22). sql targets JVM 21,
   so Gradle's variant matcher would refuse them. componentMetadataRules
   in subprojects rewrite TARGET_JVM_VERSION to 21 for these two modules
   only. Safe because: (a) sql code references only non-FFM interfaces of
   AF, (b) production runtime is JVM 25 nodes where the actual JDK 25
   bytecode runs fine, (c) rule is scoped to org.opensearch.sandbox:*.

2. Calcite version conflict
   AF transitively pulls calcite-core:1.41.0-opensearch-1 (patched fork);
   sql declares vanilla calcite-core:1.41.0. transitive = false on the
   AF/AE dependency declarations cuts the patched calcite out of sql's
   classpath. At runtime, sql's classloader delegates to AE's via
   `extendedPlugins = ['analytics-engine;optional=true']` parent-first,
   so AE's bundled patched-calcite wins; sql's vanilla copy sits idle.

mavenLocal { excludeGroup 'org.opensearch.sandbox' }: a locally-published
OpenSearch core build can install these artifacts with Gradle Module
Metadata declaring different attributes than the published snapshot, which
can shadow the remote and break resolution. Sandbox artifacts always come
from the CI snapshot repo.

Verified locally under JDK 25 (Temurin 25.0.1+8):
- ./gradlew :core:compileJava                                          ✓
- ./gradlew :opensearch-sql-plugin:compileJava                         ✓
- ./gradlew --continue build -x integTest -x yamlRestTest -x doctest   ✓
- ./gradlew :downloadAnalyticsEngineZip                                ✓
- ./gradlew :downloadArrowFlightRpcZip                                 ✓
- :downloadAnalyticsEngineZip -PanalyticsEngineZip=/path skips          ✓

Known: JDK 21 CI legs will fail with "class file has wrong version 69.0,
should be 65.0" — javac 21 cannot read JDK 25 bytecode, regardless of
Gradle-level overrides. Will be resolved when analytics-engine graduates
from sandbox (publishing JDK 21 bytecode again) or when SQL plugin's
MIN compat moves to JDK 25 ecosystem-wide. Iteration over perfection.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the chore/libs-to-remote-deps branch from f0d1d16 to 17d5014 Compare May 8, 2026 22:52
@ahkcs ahkcs changed the title Migrate analytics-engine deps to remote + bump SQL plugin to JDK 25 Resolve analytics-engine/framework from remote, drop vendored libs/ May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants