Skip to content

[PoC]: Yet another implementation of PARQUET-2249: Introduce IEEE 754 total order#9619

Draft
etseidl wants to merge 17 commits intoapache:mainfrom
etseidl:total_order_514
Draft

[PoC]: Yet another implementation of PARQUET-2249: Introduce IEEE 754 total order#9619
etseidl wants to merge 17 commits intoapache:mainfrom
etseidl:total_order_514

Conversation

@etseidl
Copy link
Copy Markdown
Contributor

@etseidl etseidl commented Mar 26, 2026

Which issue does this PR close?

Rationale for this change

This takes the implementation done by @Xuanwo (#8158) and updates it to the new thrift format and recent changes to the original proposal (apache/parquet-format#514).

What changes are included in this PR?

Adds needed thrift structures as well as NaN counts for pages and column chunks.

Are these changes tested?

Yes, new tests added (more may be needed).

Are there any user-facing changes?

Yes, this is a breaking change.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 26, 2026
@etseidl etseidl added the api-change Changes to the arrow API label Mar 26, 2026
// For floating point we need to compare NaN values until we encounter a non-NaN
// value which then becomes the new min/max. After this, only non-NaN values are
// evaluated. If all values are NaN, then the min/max NaNs as determined by
// IEEE 754 total order are returned.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has me a bit worried. I need to do some benchmarking to make sure all the complicated NaN logic isn't killing performance.

Copy link
Copy Markdown
Contributor

@jhorstmann jhorstmann Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this method is on the hot path. I had a look at optimizing it, but could not get the compiler to generate nice auto-vectorized code for nan-handling yet. I think we can try optimizations in a followup, it would be more important to get the semantics correct first and make sure there are tests for edge cases.

In that regard, about this requirement

If all values are NaN, then the min/max NaNs as determined by
// IEEE 754 total order are returned.

Does the current code correctly distinguish different NaN payloads according to their sign and bit patterns?

(Solved, github was hiding the changes to compare_greater in mod.rs)


fn update_min<T: ParquetValueType>(descr: &ColumnDescriptor, val: &T, min: &mut Option<T>) {
update_stat::<T, _>(descr, val, min, |cur| compare_greater(descr, cur, val))
if min.is_none() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here and to update_max also worry me. It's just complicated because we can't simply exclude NaN. In the case of all-NaN we have to properly order them so we get the min and max NaN, but if a non-NaN shows up we have to start over. Same thing in get_min_max().

This is why I prefer the other solution of simply using total order and dealing with the possibility of NaN in the statistics.

@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 3, 2026

Tests will fail until apache/parquet-testing#104 is merged.

@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 6, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4195303356-913-bmtrq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing total_order_514 (5de1817) to 65ad652 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                   total_order_514
-----                                     ----                                   ---------------
bool/bloom_filter                         1.00     13.8±0.09ms    18.1 MB/sec    1.00     13.8±0.04ms    18.2 MB/sec
bool/default                              1.00     11.8±0.08ms    21.2 MB/sec    1.00     11.8±0.04ms    21.3 MB/sec
bool/parquet_2                            1.03     15.2±0.09ms    16.4 MB/sec    1.00     14.8±0.05ms    16.9 MB/sec
bool/zstd                                 1.00     12.3±0.08ms    20.3 MB/sec    1.00     12.3±0.05ms    20.4 MB/sec
bool/zstd_parquet_2                       1.03     15.6±0.09ms    16.1 MB/sec    1.00     15.2±0.03ms    16.5 MB/sec
bool_non_null/bloom_filter                1.00      6.8±0.03ms    18.3 MB/sec    1.02      7.0±0.02ms    17.9 MB/sec
bool_non_null/default                     1.00      4.1±0.02ms    30.2 MB/sec    1.04      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                   1.05      8.7±0.03ms    14.4 MB/sec    1.00      8.2±0.02ms    15.2 MB/sec
bool_non_null/zstd                        1.00      4.5±0.01ms    27.8 MB/sec    1.04      4.7±0.03ms    26.9 MB/sec
bool_non_null/zstd_parquet_2              1.05      9.0±0.03ms    13.8 MB/sec    1.00      8.6±0.02ms    14.5 MB/sec
float_with_nans/bloom_filter              1.00     95.2±1.12ms   147.0 MB/sec    1.04     98.8±0.24ms   141.7 MB/sec
float_with_nans/default                   1.00     75.6±0.33ms   185.3 MB/sec    1.07     80.7±0.33ms   173.5 MB/sec
float_with_nans/parquet_2                 1.00     98.1±0.22ms   142.6 MB/sec    1.05    102.8±0.74ms   136.2 MB/sec
float_with_nans/zstd                      1.00    113.6±0.55ms   123.2 MB/sec    1.04    117.7±0.23ms   118.9 MB/sec
float_with_nans/zstd_parquet_2            1.00    135.5±0.26ms   103.3 MB/sec    1.03    139.9±0.57ms   100.1 MB/sec
list_primitive/bloom_filter               1.00    317.8±3.96ms  1715.8 MB/sec    1.19    377.5±6.61ms  1444.5 MB/sec
list_primitive/default                    1.00    238.3±2.83ms     2.2 GB/sec    1.23    292.4±3.28ms  1865.0 MB/sec
list_primitive/parquet_2                  1.00    276.5±1.00ms  1972.1 MB/sec    1.03    284.6±7.84ms  1916.0 MB/sec
list_primitive/zstd                       1.00    492.5±4.68ms  1107.3 MB/sec    1.09    534.5±6.08ms  1020.4 MB/sec
list_primitive/zstd_parquet_2             1.00    483.0±1.76ms  1129.1 MB/sec    1.03    497.0±7.34ms  1097.3 MB/sec
list_primitive_non_null/bloom_filter      1.00   449.8±13.14ms  1209.9 MB/sec    1.05   473.7±21.89ms  1148.9 MB/sec
list_primitive_non_null/default           1.00   319.1±12.39ms  1705.5 MB/sec    1.06   338.2±19.76ms  1609.1 MB/sec
list_primitive_non_null/parquet_2         1.00   317.9±16.78ms  1712.3 MB/sec    1.23    389.8±1.74ms  1396.1 MB/sec
list_primitive_non_null/zstd              1.00   732.7±18.81ms   742.7 MB/sec    1.02   749.6±26.38ms   726.0 MB/sec
list_primitive_non_null/zstd_parquet_2    1.00    720.4±1.60ms   755.4 MB/sec    1.04    752.5±1.39ms   723.3 MB/sec
primitive/bloom_filter                    1.00    158.8±1.41ms   282.7 MB/sec    1.00    158.3±1.00ms   283.5 MB/sec
primitive/default                         1.00    126.4±0.58ms   355.0 MB/sec    1.00    127.0±0.57ms   353.4 MB/sec
primitive/parquet_2                       1.00    141.4±0.94ms   317.4 MB/sec    1.00    141.5±0.79ms   317.2 MB/sec
primitive/zstd                            1.00    155.8±0.67ms   288.0 MB/sec    1.00    156.5±0.43ms   286.7 MB/sec
primitive/zstd_parquet_2                  1.00    175.5±1.41ms   255.7 MB/sec    1.00    175.4±0.37ms   255.8 MB/sec
primitive_non_null/bloom_filter           1.00    110.5±0.67ms   398.4 MB/sec    1.05    116.4±1.12ms   377.9 MB/sec
primitive_non_null/default                1.00     70.8±0.32ms   621.8 MB/sec    1.00     70.5±0.75ms   624.3 MB/sec
primitive_non_null/parquet_2              1.00     91.7±0.31ms   480.0 MB/sec    1.00     91.9±0.66ms   478.5 MB/sec
primitive_non_null/zstd                   1.00    107.0±0.91ms   411.1 MB/sec    1.00    107.4±0.62ms   409.6 MB/sec
primitive_non_null/zstd_parquet_2         1.02    132.7±2.20ms   331.6 MB/sec    1.00    129.9±3.31ms   338.8 MB/sec
string/bloom_filter                       1.03   231.1±26.62ms     2.2 GB/sec    1.00   225.0±20.21ms     2.3 GB/sec
string/default                            1.06   140.9±25.10ms     3.6 GB/sec    1.00   132.9±19.73ms     3.9 GB/sec
string/parquet_2                          1.63    178.2±1.42ms     2.9 GB/sec    1.00    109.1±2.46ms     4.7 GB/sec
string/zstd                               1.04   460.7±20.53ms  1137.9 MB/sec    1.00    442.9±6.28ms  1183.7 MB/sec
string/zstd_parquet_2                     1.01    408.1±9.64ms  1284.6 MB/sec    1.00    406.0±4.25ms  1291.1 MB/sec
string_and_binary_view/bloom_filter       1.00     66.5±0.93ms   485.1 MB/sec    1.00     66.7±0.16ms   483.2 MB/sec
string_and_binary_view/default            1.00     50.1±0.29ms   643.6 MB/sec    1.00     50.0±0.10ms   644.7 MB/sec
string_and_binary_view/parquet_2          1.00     60.9±0.40ms   529.3 MB/sec    1.00     60.7±0.37ms   531.1 MB/sec
string_and_binary_view/zstd               1.00     86.8±0.66ms   371.5 MB/sec    1.00     86.6±0.11ms   372.5 MB/sec
string_and_binary_view/zstd_parquet_2     1.01     74.9±0.61ms   430.4 MB/sec    1.00     74.5±0.68ms   432.6 MB/sec
string_dictionary/bloom_filter            1.33    125.0±2.94ms     2.1 GB/sec    1.00     93.9±3.26ms     2.7 GB/sec
string_dictionary/default                 1.60     78.1±1.15ms     3.3 GB/sec    1.00     48.7±1.48ms     5.3 GB/sec
string_dictionary/parquet_2               1.46     80.4±0.30ms     3.2 GB/sec    1.00     55.2±0.63ms     4.7 GB/sec
string_dictionary/zstd                    1.11    233.0±2.08ms  1133.6 MB/sec    1.00    210.1±1.78ms  1256.9 MB/sec
string_dictionary/zstd_parquet_2          1.06    212.4±0.97ms  1243.6 MB/sec    1.00    199.5±0.64ms  1323.7 MB/sec
string_non_null/bloom_filter              1.00   256.3±12.92ms  2044.3 MB/sec    1.00   256.8±14.44ms  2040.5 MB/sec
string_non_null/default                   1.00   137.7±13.19ms     3.7 GB/sec    1.04   143.5±12.59ms     3.6 GB/sec
string_non_null/parquet_2                 1.00    145.3±7.14ms     3.5 GB/sec    1.00   145.6±10.05ms     3.5 GB/sec
string_non_null/zstd                      1.00    537.1±4.05ms   975.6 MB/sec    1.05   565.8±18.78ms   926.1 MB/sec
string_non_null/zstd_parquet_2            1.00    505.1±2.39ms  1037.4 MB/sec    1.02    515.7±3.83ms  1016.0 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1337.4s
Peak memory 4.9 GiB
Avg memory 4.6 GiB
CPU user 1256.3s
CPU sys 80.8s
Peak spill 0 B

branch

Metric Value
Wall time 1357.8s
Peak memory 4.9 GiB
Avg memory 4.6 GiB
CPU user 1263.2s
CPU sys 94.5s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Prototype: PARQUET-2249: Introduce IEEE 754 total order & NaN-counts #514

3 participants