Skip to content

[GLUTEN-12225][CORE] Fix arrow.c shading: exclude memory/vector packages so public API stays unshaded#12226

Open
sezruby wants to merge 2 commits into
apache:mainfrom
sezruby:fix/arrow-c-shading-mismatch
Open

[GLUTEN-12225][CORE] Fix arrow.c shading: exclude memory/vector packages so public API stays unshaded#12226
sezruby wants to merge 2 commits into
apache:mainfrom
sezruby:fix/arrow-c-shading-mismatch

Conversation

@sezruby
Copy link
Copy Markdown

@sezruby sezruby commented Jun 2, 2026

What changes were proposed in this pull request?

Extend package/pom.xml's org.apache.arrow relocation excludes to also keep org.apache.arrow.memory.** and org.apache.arrow.vector.** unshaded.

The bundled Arrow C-Data classes (org.apache.arrow.c.*) are correctly excluded from relocation because their native JNI binds to the original class names. However, their public API signatures take and return org.apache.arrow.memory.* and org.apache.arrow.vector.* types — which were being relocated. The result: the bundled ArrowArrayStream / ArrowSchema / ArrowArray / Data classes get compiled against the shaded BufferAllocator / VectorSchemaRoot, so any caller passing a vanilla Apache Arrow allocator hits NoSuchMethodError.

This affects any Spark workload that combines gluten with another library using Arrow C-Data (Iceberg's Arrow vector layer, Lance Java's writer, Snowflake JDBC's Arrow result decoder, etc.) when gluten's bundle wins classloader resolution against vanilla Arrow.

How was this patch tested?

Adds dev/check-arrow-c-shading.sh which runs javap on the produced bundle jar and asserts that public method signatures reference unshaded Arrow types. Wired into package/pom.xml's verify phase via exec-maven-plugin so regressions are caught in CI.

Tested against the upstream gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar:

$ dev/check-arrow-c-shading.sh /path/to/gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar
  FAIL org/apache/arrow/c/ArrowArrayStream — public API references gluten-shaded Arrow types:
      public static org.apache.arrow.c.ArrowArrayStream allocateNew(
        org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
  FAIL org/apache/arrow/c/ArrowSchema — public API references gluten-shaded Arrow types:
      public static org.apache.arrow.c.ArrowSchema allocateNew(
        org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
  FAIL org/apache/arrow/c/ArrowArray — public API references gluten-shaded Arrow types:
      public static org.apache.arrow.c.ArrowArray allocateNew(
        org.apache.gluten.shaded.org.apache.arrow.memory.BufferAllocator);
  FAIL org/apache/arrow/c/Data — public API references gluten-shaded Arrow types:
      [16 methods touching shaded org.apache.arrow.memory/vector types]

Bundle has 4 Arrow C-Data class(es) with shaded API types.
exit code: 1

After applying the relocation exclude change, a freshly-built bundle should pass the same check (script exits 0). The repro from #12225 (3 lines calling ArrowArrayStream.allocateNew(new RootAllocator(...)) ) goes from NoSuchMethodError to OK.

Closes

#12225

…API stays unshaded

The bundled Arrow C-Data classes (org.apache.arrow.c.*) are correctly
excluded from relocation because their native JNI binds to the original
class names. However, their public API signatures take and return
org.apache.arrow.memory.* and org.apache.arrow.vector.* types, which were
being relocated to org.apache.gluten.shaded.*. The result: bundled
ArrowArrayStream/ArrowSchema/ArrowArray/Data classes are compiled against
the shaded BufferAllocator/VectorSchemaRoot, so any caller passing a
vanilla Apache Arrow allocator gets NoSuchMethodError.

Triggered for any Spark workload that combines gluten with another library
using Arrow C-Data (Iceberg's Arrow vector layer, Lance Java's writer,
Snowflake JDBC's Arrow result decoder, etc.) when gluten's bundle wins
classloader resolution against vanilla Arrow.

Fix: extend the relocation excludes to also keep org.apache.arrow.memory.**
and org.apache.arrow.vector.** unshaded. The bundled C-Data API now matches
the public Apache Arrow API.

Adds dev/check-arrow-c-shading.sh which runs javap on the produced bundle
jar and asserts that public method signatures reference unshaded Arrow
types. Wired into package/pom.xml's verify phase via exec-maven-plugin so
regressions are caught in CI. Tested against the upstream
gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.6.0.jar — script exits 1
with a clear diagnosis on the broken bundle.

Closes apache#12225
@github-actions github-actions Bot added CORE works for Gluten Core BUILD labels Jun 2, 2026
@sezruby
Copy link
Copy Markdown
Author

sezruby commented Jun 2, 2026

@philo-he, @zhouyuan could you have a look at the PR?

@philo-he
Copy link
Copy Markdown
Member

philo-he commented Jun 3, 2026

@sezruby, thanks for the PR. This fix makes sense. I recall there's a related issue that occurs at compile time when an external project introduces the Gluten JAR as a dependency: a Scala type mismatch caused by the Maven Shade Plugin not rewriting ScalaSignature annotations. My understanding is this PR also fixes that case (see https://chungmin.hashnode.dev/unraveling-a-scala-type-mismatch-mystery).

One small concern is potential Arrow version conflicts, since these packages are no longer shaded. That said, the memory and vector APIs should be stable across minor versions, so I assume the risk should be low in practice.

cc @zhztheplayer

@philo-he philo-he changed the title [CORE] Fix arrow.c shading: exclude memory/vector packages so public API stays unshaded [GLUTEN-12225][CORE] Fix arrow.c shading: exclude memory/vector packages so public API stays unshaded Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BUILD CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants