feat[array]: Patched array remove GPU transpose#8127
Conversation
da1a0a9 to
9a73899
Compare
Remove the eager GPU lane transpose from the Patched array. Patches are now kept in their natural sorted layout with one chunk offset per 1024-element chunk, mirroring the Patches helper, and execute/slice/take/compare/filter reuse the existing untransposed patch machinery. The transpose count (n_lanes) is retained as metadata so the data-parallel GPU transpose can be re-added in the future without a format change. Signed-off-by: Claude <noreply@anthropic.com>
Add SparsePatchedPlugin which, when the experimental Patched encoding is enabled, deserializes a primitive Sparse array as a Patched array over a constant fill. A Sparse array is logically patches on top of a constant, so this reuses the same externalization shim already used for BitPacked and ALP. Non-primitive (bool/varbin/struct/fixed-size-list) and nullable-patch sparse arrays remain Sparse, since Patched only represents primitive inners with non-null patch values. Signed-off-by: Claude <noreply@anthropic.com>
Microbenchmark isolating loop ordering for patched bitpacked decode: unpack every 1024-block then scatter all patches, versus unpacking one block and patching it while still hot in cache. Both do identical work, so any delta is cache locality. Fused wins 20-40% once the output exceeds L2, peaking around moderate patch density. Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
9a73899 to
b38b783
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.811x ✅, 21↑ 0↓)
datafusion / vortex-compact (0.819x ✅, 22↑ 0↓)
datafusion / parquet (0.885x ✅, 15↑ 1↓)
datafusion / arrow (0.787x ✅, 21↑ 0↓)
duckdb / vortex-file-compressed (0.784x ✅, 22↑ 0↓)
duckdb / vortex-compact (0.812x ✅, 21↑ 0↓)
duckdb / parquet (0.890x ✅, 13↑ 1↓)
duckdb / duckdb (0.803x ✅, 22↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (3 files changed, -0.0% overall, 0↑ 3↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.056x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.074x ➖, 0↑ 4↓)
datafusion / parquet (1.095x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.092x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.045x ➖, 1↑ 1↓)
duckdb / parquet (1.089x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.152x ❌ datafusion / vortex-file-compressed (1.152x ❌, 0↑ 7↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.134x ❌, 0↑ 62↓)
datafusion / vortex-compact (1.131x ❌, 1↑ 56↓)
datafusion / parquet (1.125x ❌, 0↑ 63↓)
duckdb / vortex-file-compressed (1.105x ❌, 0↑ 51↓)
duckdb / vortex-compact (1.118x ❌, 0↑ 49↓)
duckdb / parquet (1.080x ➖, 0↑ 24↓)
duckdb / duckdb (1.064x ➖, 0↑ 19↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMEFile Size Changes (11 files changed, -0.5% overall, 0↑ 11↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.019x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.031x ➖, 0↑ 1↓)
datafusion / parquet (1.088x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (0.947x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.006x ➖, 0↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.949x ➖ unknown / unknown (0.963x ➖, 1↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.166x ❌, 1↑ 6↓)
duckdb / vortex-compact (1.144x ❌, 0↑ 10↓)
duckdb / parquet (1.123x ❌, 0↑ 8↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsFile Size Changes (2 files changed, -6.0% overall, 1↑ 1↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.133x ❌, 0↑ 30↓)
datafusion / parquet (1.101x ❌, 1↑ 26↓)
duckdb / vortex-file-compressed (1.099x ➖, 1↑ 25↓)
duckdb / parquet (1.054x ➖, 0↑ 4↓)
duckdb / duckdb (1.063x ➖, 1↑ 7↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (184 files changed, -0.5% overall, 40↑ 144↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.828x ➖, 9↑ 4↓)
datafusion / vortex-compact (1.016x ➖, 1↑ 3↓)
datafusion / parquet (1.068x ➖, 0↑ 5↓)
duckdb / vortex-file-compressed (0.907x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.966x ➖, 0↑ 1↓)
duckdb / parquet (0.991x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.335x ❌, 0↑ 22↓)
datafusion / vortex-compact (1.356x ❌, 0↑ 22↓)
datafusion / parquet (1.265x ❌, 0↑ 22↓)
datafusion / arrow (1.312x ❌, 0↑ 22↓)
duckdb / vortex-file-compressed (1.282x ❌, 0↑ 22↓)
duckdb / vortex-compact (1.259x ❌, 0↑ 22↓)
duckdb / parquet (1.153x ❌, 0↑ 18↓)
duckdb / duckdb (1.272x ❌, 0↑ 22↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMEFile Size Changes (19 files changed, -0.0% overall, 1↑ 18↓)
Totals:
|
Summary
Closes: #000
Testing