Skip to content

Comments

feat: zero-copy columnar conversion for ArrowColumnVector-backed batches#3520

Open
tokoko wants to merge 5 commits intoapache:mainfrom
tokoko:spark-zero-copy
Open

feat: zero-copy columnar conversion for ArrowColumnVector-backed batches#3520
tokoko wants to merge 5 commits intoapache:mainfrom
tokoko:spark-zero-copy

Conversation

@tokoko
Copy link
Contributor

@tokoko tokoko commented Feb 14, 2026

Closes #3518

What changes are included in this PR?

  • Introduces a new tryZeroCopyConvert method in CometArrowConverters which receives ColumarBatch of any type and returns ColumnarBatch of CometVector objects if the input is composed of ArrowColumnVector objects, returns None otherwise.
  • Columnar conversion path in CometSparkToColumnarExec always tries tryZeroCopyConvert first and falls back to current flow if zero-copy conversion is impossible.
  • The implementation ignores batchSize configuration as it would be a lot more involved to do that with zero-copy... and I think zero-copy is more important in this case, especially if you assume that whatever operator produces the input will also have some similar configuration. Happy to change the implementation if you disagree though.

How are these changes tested?

  • added tests that test conversion of hand-crafted ColumnarBatch objects as there's no out-of-box data source in spark that produces ColumnarBatch of ArrowColumnVector objects.

@tokoko tokoko marked this pull request as draft February 15, 2026 19:50
@tokoko tokoko marked this pull request as ready for review February 21, 2026 11:26
@tokoko
Copy link
Contributor Author

tokoko commented Feb 21, 2026

This turned out to be a bit complicated. arrow classes in common module are shaded, while objects coming from spark aren't, so we can't just pass underlying ValueVectors through. PR uses c data interface split across common and spark modules. spark batches are exported to plain pointers on spark module side (which isn't shading arrow) and imported into CometVector batches in common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZeroCopy Conversion from Spark ColumnarBatch

1 participant