[SPARK-57036][SQL] Use intrinsic bulk-fill APIs for constant-value WritableColumnVector methods by viirya · Pull Request #56082 · apache/spark

viirya · 2026-05-24T00:19:19Z

What changes were proposed in this pull request?

Six bulk-fill methods on the column vectors implement constant-value
fills with degenerate per-element loops. This PR replaces them with
intrinsic substitutions:

Method	Substitution
`OnHeapColumnVector.putBooleans(rowId, count, value)`	`Arrays.fill(byte[], ..., (byte) v)`
`OnHeapColumnVector.putBytes(rowId, count, value)`	`Arrays.fill(byte[], ...)`
`OnHeapColumnVector.putShorts(rowId, count, value)`	`Arrays.fill(short[], ...)`
`OnHeapColumnVector.putLongs(rowId, count, value)`	`Arrays.fill(long[], ...)`
`OffHeapColumnVector.putBooleans(rowId, count, value)`	`Platform.setMemory` with small-count fallback
`OffHeapColumnVector.putBytes(rowId, count, value)`	`Platform.setMemory` with small-count fallback

The two OffHeap methods share a SET_MEMORY_THRESHOLD = 128 constant.
Below the threshold, an inline byte loop avoids the JNI fixed cost of
Unsafe.setMemory; at or above, setMemory dominates and the gain
accelerates rapidly.

This PR also adds WritableColumnVectorBulkFillBenchmark to measure
these constant-value bulk-fill APIs across a count sweep covering both
the small-count (call-overhead dominated) and large-count (memory
bandwidth dominated) regimes.

Why are the changes needed?

The bulk-fill APIs on WritableColumnVector are the natural call to
make from any column writer, but their implementations were per-element
loops. Switching to intrinsics:

Arrays.fill is backed by HotSpot's _jbyte_fill / _jshort_fill /
_jlong_fill intrinsic stubs.
Unsafe.setMemory lowers to a native memset. For OffHeap byte
fills the original per-byte Platform.putByte loop cannot be
vectorized through the JNI call, so the gain is dramatic at large
counts.

Benchmark numbers (GitHub Actions, JDK 17, Scala 2.13)

Measured by running WritableColumnVectorBulkFillBenchmark via the
Run benchmarks workflow on both the baseline (#56084) and this PR's
branch, so the two runs use identical hardware and JDK. Rate (M
elements/s):

OffHeap byte fills (putBytes / putBooleans) — the headline win:

count	baseline	patched	delta
1	~290	~240	within run-to-run noise (~30%)
8	~1,390	~1,280	within run-to-run noise (~10%)
64	~2,550	~2,450	parity
512	~2,700	~19,500	+7.2x
4,096	~2,770	~39,200	+14.1x
65,536	~2,780	~44,500	+16.0x

(Numbers averaged across putBytes and putBooleans since they share
the same code path.)

At and above the 128-element threshold, setMemory produces a 7-16x
improvement that grows with run length, consistent with memset being
amortized cleanly over long fills. Below the threshold, both runs use
the same inline byte loop, so the small differences at count=1 and
count=8 are GHA run-to-run variance rather than a structural change.

OnHeap fills: on the GHA runner (Linux + Zulu JDK 17) the C2
compiler already auto-vectorizes the original byte loop near the byte
memory-bandwidth ceiling, so Arrays.fill is at parity (~2,790 M/s,
unchanged across putBooleans / putBytes / putShorts / putLongs,
all counts, both baseline and patched). On Apple M4 Max + OpenJDK 21
the same change yields +5-33% in the small/medium count range. The
OnHeap changes are kept for consistency with the OffHeap fixes and to
avoid future divergence between platforms.

OffHeap multi-byte fills (putShorts / putInts / putLongs /
putFloats / putDoubles) are out of scope: Platform.setMemory is
byte-only and a value=0 short-circuit alternative was tried and showed
no measurable gain.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests; no behavior change. Ran locally:

VectorizedRleValuesReaderSuite
ColumnVectorSuite
ColumnarBatchSuite
ParquetIOSuite

237 tests, all pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

…itableColumnVector methods Six bulk-fill methods on the column vectors implement constant-value fills with degenerate per-element loops: OnHeapColumnVector: putBooleans(int rowId, int count, boolean value) putBytes(int rowId, int count, byte value) putShorts(int rowId, int count, short value) putLongs(int rowId, int count, long value) OffHeapColumnVector: putBooleans(int rowId, int count, boolean value) putBytes(int rowId, int count, byte value) Replace them with intrinsic substitutions: - OnHeap variants -> Arrays.fill on the typed array. - OffHeap variants -> Platform.setMemory with a small-count fallback to an inline byte loop, gated by a SET_MEMORY_THRESHOLD of 128. Below the threshold, the JNI fixed cost of Unsafe.setMemory loses to the inline loop; at or above, setMemory dominates and gains accelerate to ~10x at count=4096+. Also adds WritableColumnVectorBulkFillBenchmark for measuring the constant-value bulk-fill APIs across a count sweep (1, 8, 64, 512, 4096, 65536), covering both OnHeap and OffHeap paths. This is the benchmark used to produce the numbers in the PR description. OffHeap multi-byte fills (putShorts / putInts / putLongs / putFloats / putDoubles) are out of scope: Platform.setMemory is byte-only and a value=0 short-circuit alternative was tried and showed no measurable gain on Apple M4 Max + OpenJDK 21. Co-authored-by: Claude Code

….13, split 1 of 1)

viirya force-pushed the SPARK-57036 branch from 8cfa271 to 70fb184 Compare May 24, 2026 00:24

Benchmark results for *WritableColumnVectorBulkFill* (JDK 17, Scala 2…

73440c0

….13, split 1 of 1)

viirya requested review from cloud-fan and gengliangwang May 24, 2026 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57036][SQL] Use intrinsic bulk-fill APIs for constant-value WritableColumnVector methods#56082

[SPARK-57036][SQL] Use intrinsic bulk-fill APIs for constant-value WritableColumnVector methods#56082
viirya wants to merge 2 commits into
apache:masterfrom
viirya:SPARK-57036

viirya commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viirya commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Benchmark numbers (GitHub Actions, JDK 17, Scala 2.13)

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

viirya commented May 24, 2026 •

edited

Loading