Add MaskNullAsFalse for nullable boolean predicates#8121
Conversation
Mask execution now requires a non-nullable boolean array and errors on nullable input. The new MaskNullAsFalse executable preserves the previous null-as-false coercion for filter and pruning predicates over nullable data, where SQL semantics treat NULL as not matching. Predicate-evaluation call sites (filter, prune, dict filter, is_constant) use MaskNullAsFalse; validity-array call sites keep the stricter Mask. Signed-off-by: Claude <noreply@anthropic.com>
Remove the shared NullHandling enum and execute_mask helper in favor of a self-contained Executable impl for each target. Mask still requires a non-nullable boolean array; MaskNullAsFalse coerces nulls to false. Signed-off-by: Claude <noreply@anthropic.com>
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.007x ➖ datafusion / vortex-file-compressed (1.007x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.999x ➖, 0↑ 0↓)
datafusion / parquet (0.992x ➖, 1↑ 0↓)
datafusion / arrow (1.005x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.988x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.993x ➖, 0↑ 0↓)
duckdb / parquet (1.014x ➖, 1↑ 3↓)
duckdb / duckdb (0.994x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.981x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.013x ➖, 0↑ 0↓)
datafusion / parquet (0.987x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.035x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.015x ➖, 0↑ 1↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.987x ➖, 3↑ 0↓)
datafusion / vortex-compact (1.002x ➖, 1↑ 2↓)
datafusion / parquet (0.982x ➖, 4↑ 1↓)
duckdb / vortex-file-compressed (0.985x ➖, 5↑ 1↓)
duckdb / vortex-compact (0.991x ➖, 3↑ 1↓)
duckdb / parquet (0.992x ➖, 2↑ 0↓)
duckdb / duckdb (1.012x ➖, 1↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.034x ➖, 1↑ 2↓)
datafusion / vortex-compact (1.081x ➖, 0↑ 0↓)
datafusion / parquet (1.144x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.145x ➖, 0↑ 2↓)
duckdb / vortex-compact (0.983x ➖, 1↑ 0↓)
duckdb / parquet (1.083x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.005x ➖, 0↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.107x ❌, 0↑ 15↓)
datafusion / vortex-compact (1.003x ➖, 5↑ 2↓)
datafusion / parquet (1.072x ➖, 0↑ 6↓)
datafusion / arrow (1.056x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.080x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.081x ➖, 0↑ 3↓)
duckdb / parquet (1.039x ➖, 0↑ 0↓)
duckdb / duckdb (1.049x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.002x ➖, 1↑ 1↓)
datafusion / parquet (0.999x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.967x ➖, 6↑ 1↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
duckdb / duckdb (0.970x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.418x ❌, 0↑ 13↓)
datafusion / vortex-compact (1.458x ❌, 0↑ 14↓)
datafusion / parquet (1.158x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.101x ➖, 0↑ 3↓)
duckdb / vortex-compact (1.150x ➖, 0↑ 4↓)
duckdb / parquet (1.123x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 0.993x ➖ unknown / unknown (0.991x ➖, 4↑ 1↓)
|
Benchmarks: Random AccessVortex (geomean): 0.802x ✅ unknown / unknown (0.824x ✅, 34↑ 0↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.976x ➖, 2↑ 1↓)
datafusion / vortex-compact (0.988x ➖, 3↑ 7↓)
datafusion / parquet (1.082x ➖, 2↑ 7↓)
duckdb / vortex-file-compressed (0.963x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.991x ➖, 0↑ 0↓)
duckdb / parquet (0.901x ➖, 1↑ 0↓)
Full attributed analysis
|
…ing-1eKcy Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> # Conflicts: # vortex-array/public-api.lock # vortex/public-api.lock
|
Why is this better than before? |
|
First the semantics of the |
|
I'm not sure the semantics are that surprising...? We can remove the alloc/& by checking the nullability of the dtype. |
|
Mapping null to false is a hidden default (useful only in layouts) as seen in this PR. |
Summary
This PR introduces
MaskNullAsFalse, a newExecutabletarget that executes boolean arrays intoMaskobjects while coercing null elements tofalse. This addresses the previous TODO comment about handling nullable boolean arrays in mask execution.