Remove parquet arrow_cast dependency #9077

tustvold · 2025-12-31T16:14:56Z

Which issue does this PR close?

Rationale for this change

Arrow_cast is fairly heavy dependency, especially now that it bundles in arrow-ord for RunEndEncodedArrays (#8708). Removing this dependency has been discussed as far back as 2024, let's finally actually do it #4764.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Yes, unfortunately #8524 added an API that allows overriding the inferred schema, which in turn allows the coercion machinery to traverse somewhat unintended paths. I personally think this API shouldn't exist, but...

tustvold · 2025-12-31T16:15:32Z

parquet/src/arrow/array_reader/primitive_array.rs

-                let a = arrow_cast::cast(&array, &ArrowType::Date32)?;
+                let a = array
+                    .as_primitive::<Int32Type>()
+                    .reinterpret_cast::<Date32Type>();


When not performing a widening / truncating conversion, reinterpret_cast will be faster

tustvold · 2025-12-31T16:50:23Z

parquet/src/arrow/arrow_writer/mod.rs

                        .unary::<_, Int32Type>(|v| v.as_i128() as i32);
                    write_primitive(typed, array.values(), levels)
                }
-                ArrowDataType::Dictionary(_, value_type) => match value_type.as_ref() {


We instead lift this into the ArrowColumnWriter::write method which not only cuts down on duplication, but ensures we treat dictionary and non-dictionary values equivalently

tustvold · 2025-12-31T20:10:01Z

parquet/src/arrow/arrow_writer/mod.rs

    }

-    #[test]
-    fn test_arrow_writer_explicit_schema() {


I wasn't really sure how to adapt this test, I don't really understand why we are relying on the parquet writer to do type coercion beyond that necessary for mapping the arrow data model to parquet.

@paleolimbot perhaps you can weigh in here?

The motivation for explicitly setting the Parquet schema was so that I could write this test:

arrow-rs/parquet/tests/geospatial.rs

Lines 281 to 287 in 843bee2

let schema = parquet_schema_geometry();

let props = WriterProperties::builder()

.set_statistics_enabled(EnabledStatistics::Chunk)

.build();

let options = ArrowWriterOptions::new()

.with_parquet_schema(schema)

.with_properties(props);

At the time there was no way to write a Parquet Geometry logical type using the Arrow writer and the code path for statistics based on Arrow arrays couldn't be tested. Several months later this was added (if the geospatial feature is enabled, the appropriate extension type is translated to Geometry or Geography on write) and we no longer need this feature for the test.

It's potentially useful as an escape hatch if the build-time features for canonical extensions/geospatial/variant aren't fine-grained enough to satisfy an application (e.g., perhaps allowing a runtime config option) or perhaps to more easily create files with very specific Parquet schemas but I don't have strong feelings.

Do you see any issues with the changes proposed in this PR for this use-case? I think it should still be fine, but I'm a little unfamiliar with how extension types have been wired in

I think it should be fine as well given that the geospatial.rs test still passes. The intent was to test that the code path invoked by with_parquet_schema() actually resulted in the Parquet schema being used by the writer. If that code path now results in something else happening than what I wrote here, it's probably best to keep the test but adjust the expected result (i.e., this test is just checking for plugged in wires).

tustvold · 2025-12-31T20:13:57Z

The impact on compile time for a clean release build is rather nice

Current main

________________________________________________________
Executed in   23.46 secs    fish           external
   usr time  244.69 secs  765.00 micros  244.69 secs
   sys time    6.73 secs    0.00 micros    6.73 secs

This PR

________________________________________________________
Executed in   18.11 secs    fish           external
   usr time  148.52 secs  802.00 micros  148.52 secs
   sys time    5.46 secs    0.00 micros    5.46 secs

So just under half the CPU time 🎉

Edit: I'm also pretty chuffed that this is somehow a net reduction in LOC 😅

alamb · 2026-01-05T12:07:03Z

amazing

alamb · 2026-01-05T12:08:19Z

run benchmark arrow_reader arrow_reader_clickbench

alamb-ghbot · 2026-01-05T12:08:30Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing cleanup-casting (b0bd4a4) to 843bee2 diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=cleanup-casting
Results will be posted here when complete

alamb-ghbot · 2026-01-05T13:26:01Z

🤖: Benchmark completed

Details

group                                                                                                      cleanup-casting                        main
-----                                                                                                      ---------------                        ----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01  1228.3±18.94µs        ? ?/sec    1.00   1222.1±6.90µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.04  1309.3±31.79µs        ? ?/sec    1.00  1259.0±13.33µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.00   1232.7±6.51µs        ? ?/sec    1.00  1231.2±18.76µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.01    502.8±4.83µs        ? ?/sec    1.00    499.4±4.94µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.01   687.3±12.75µs        ? ?/sec    1.00   679.2±15.00µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.00    505.0±9.70µs        ? ?/sec    1.03   519.9±10.27µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.07    550.3±4.30µs        ? ?/sec    1.00    514.1±6.47µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    746.9±7.67µs        ? ?/sec    1.00   747.7±25.50µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.06   565.0±10.50µs        ? ?/sec    1.00   530.9±11.21µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    241.4±4.63µs        ? ?/sec    1.18    285.3±8.42µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    211.5±1.44µs        ? ?/sec    1.25    265.0±2.11µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    236.5±3.98µs        ? ?/sec    1.19    281.9±4.23µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.22    357.2±3.15µs        ? ?/sec    1.00    293.2±3.51µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.16    329.2±3.26µs        ? ?/sec    1.00    284.4±3.51µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.01    293.5±2.27µs        ? ?/sec    1.00    289.8±7.80µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.21    366.4±5.93µs        ? ?/sec    1.00    303.8±2.64µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00   1075.5±7.68µs        ? ?/sec    1.05  1134.6±34.74µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00   922.6±13.24µs        ? ?/sec    1.07    983.0±9.84µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00  1086.6±26.00µs        ? ?/sec    1.05  1139.2±13.22µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    456.9±4.97µs        ? ?/sec    1.00    457.4±6.93µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    604.1±3.97µs        ? ?/sec    1.07   643.4±12.91µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    466.7±7.11µs        ? ?/sec    1.00    466.7±3.81µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    153.2±0.78µs        ? ?/sec    1.05    161.1±1.67µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.06    294.9±7.54µs        ? ?/sec    1.00    278.4±3.86µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    159.2±3.71µs        ? ?/sec    1.05    167.6±1.72µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.01     77.0±0.69µs        ? ?/sec    1.00     75.9±0.63µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.09    256.6±4.98µs        ? ?/sec    1.00    234.4±0.95µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.02     82.2±8.45µs        ? ?/sec    1.00     80.9±0.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00   696.7±15.38µs        ? ?/sec    1.06    737.1±3.84µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    531.4±4.17µs        ? ?/sec    1.10    584.8±3.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    701.3±6.90µs        ? ?/sec    1.06    744.5±3.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.28     82.2±3.59µs        ? ?/sec    1.00     64.1±6.83µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    223.5±5.37µs        ? ?/sec    1.15    256.3±2.04µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.37     91.2±2.90µs        ? ?/sec    1.00     66.8±6.54µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     85.8±0.68µs        ? ?/sec    1.10     94.7±0.57µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.09    227.5±1.47µs        ? ?/sec    1.00    208.4±3.65µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     90.8±0.53µs        ? ?/sec    1.10     99.8±0.67µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.3±0.31µs        ? ?/sec    1.01      9.4±0.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.15    189.7±2.48µs        ? ?/sec    1.00    164.9±1.78µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.01     14.5±0.26µs        ? ?/sec    1.00     14.4±0.15µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    170.6±0.44µs        ? ?/sec    1.09    185.2±2.97µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.02    336.4±8.90µs        ? ?/sec    1.00    330.1±3.20µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    177.7±7.66µs        ? ?/sec    1.07    190.4±2.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.03     14.4±0.56µs        ? ?/sec    1.00     14.0±0.20µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.06    257.7±1.17µs        ? ?/sec    1.00    244.2±1.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     19.9±0.44µs        ? ?/sec    1.00     19.9±0.46µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    342.1±5.36µs        ? ?/sec    1.07    366.2±1.71µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    343.0±1.97µs        ? ?/sec    1.14    389.4±3.97µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    348.8±1.74µs        ? ?/sec    1.07    373.0±2.69µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     25.1±0.88µs        ? ?/sec    1.01     25.3±0.44µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.00    187.7±5.70µs        ? ?/sec    1.17    219.6±4.46µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.05     33.1±0.95µs        ? ?/sec    1.00     31.5±1.22µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.03    113.4±1.54µs        ? ?/sec    1.00    109.9±1.16µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.01    130.9±1.12µs        ? ?/sec    1.00    129.2±1.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.02    115.3±1.93µs        ? ?/sec    1.00    112.9±1.57µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.03   165.2±17.63µs        ? ?/sec    1.00    161.1±0.64µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    223.9±3.09µs        ? ?/sec    1.01   225.4±11.72µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    167.8±0.73µs        ? ?/sec    1.00    167.6±2.63µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.1±0.36µs        ? ?/sec    1.00     77.4±0.95µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    177.1±4.39µs        ? ?/sec    1.00    176.9±1.35µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     83.2±0.74µs        ? ?/sec    1.00     82.7±0.62µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.04    145.5±4.09µs        ? ?/sec    1.00    139.3±5.55µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.02    215.3±1.26µs        ? ?/sec    1.00    211.0±2.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.04    150.6±1.29µs        ? ?/sec    1.00    144.2±1.42µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.0±0.33µs        ? ?/sec    1.01     74.8±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    174.4±2.31µs        ? ?/sec    1.02    177.3±1.23µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.4±0.63µs        ? ?/sec    1.02     80.4±0.40µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.01    110.7±0.62µs        ? ?/sec    1.00    109.6±0.86µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    132.5±2.11µs        ? ?/sec    1.01    133.5±1.13µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.01    113.2±4.80µs        ? ?/sec    1.00    112.6±0.71µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.01    165.0±1.07µs        ? ?/sec    1.00    163.0±1.08µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.01    235.9±2.42µs        ? ?/sec    1.00    232.7±2.38µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    169.8±2.13µs        ? ?/sec    1.00    169.0±1.04µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01    202.0±0.50µs        ? ?/sec    1.00    200.6±0.71µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.01    250.2±2.54µs        ? ?/sec    1.00    248.0±2.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.0±0.90µs        ? ?/sec    1.00    207.1±0.66µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    143.2±0.93µs        ? ?/sec    1.09    156.7±3.84µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    220.2±5.39µs        ? ?/sec    1.01    222.0±3.94µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    149.0±2.13µs        ? ?/sec    1.08    160.2±0.81µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00    111.5±1.30µs        ? ?/sec    1.00    111.7±1.85µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    199.9±6.87µs        ? ?/sec    1.00    199.1±5.86µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.01    118.7±1.60µs        ? ?/sec    1.00    118.1±1.28µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.03     79.8±0.70µs        ? ?/sec    1.00     77.7±2.24µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    103.1±1.29µs        ? ?/sec    1.00    102.7±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.02     82.3±0.86µs        ? ?/sec    1.00     80.9±4.85µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.02    109.8±0.46µs        ? ?/sec    1.00    107.2±0.56µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    173.0±0.65µs        ? ?/sec    1.01    174.5±3.40µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.02    114.6±0.46µs        ? ?/sec    1.00    112.8±2.56µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     43.8±0.23µs        ? ?/sec    1.01     44.0±0.19µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    140.4±0.67µs        ? ?/sec    1.01    141.2±3.10µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     48.3±0.62µs        ? ?/sec    1.01     48.6±0.98µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.05    110.5±0.43µs        ? ?/sec    1.00    105.5±0.54µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.01    177.7±2.72µs        ? ?/sec    1.00    175.3±1.41µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.05    115.7±1.75µs        ? ?/sec    1.00    110.5±0.52µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.00     37.6±0.22µs        ? ?/sec    1.01     38.0±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    136.5±3.03µs        ? ?/sec    1.01    138.2±0.63µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.00     42.5±0.27µs        ? ?/sec    1.03     43.7±0.56µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.01     84.8±0.40µs        ? ?/sec    1.00     83.9±1.02µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    101.7±0.75µs        ? ?/sec    1.01    102.5±1.35µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.01     87.5±1.02µs        ? ?/sec    1.00     86.9±0.80µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    109.9±1.16µs        ? ?/sec    1.00    109.5±1.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    164.9±2.04µs        ? ?/sec    1.02    167.9±1.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    114.8±1.37µs        ? ?/sec    1.00    114.6±2.49µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     22.8±0.45µs        ? ?/sec    1.15     26.2±0.37µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    120.5±0.44µs        ? ?/sec    1.04    124.9±2.25µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     27.3±0.37µs        ? ?/sec    1.14     31.2±0.69µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.04     91.8±1.27µs        ? ?/sec    1.00     88.2±0.46µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    157.2±0.97µs        ? ?/sec    1.01    158.5±1.78µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.04     97.0±2.82µs        ? ?/sec    1.00     93.5±2.32µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     16.5±0.42µs        ? ?/sec    1.09     18.1±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    118.1±1.53µs        ? ?/sec    1.04    122.3±2.79µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     22.5±0.37µs        ? ?/sec    1.09     24.6±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     81.5±0.46µs        ? ?/sec    1.00     81.5±0.41µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.01    103.1±0.82µs        ? ?/sec    1.00    102.6±1.17µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.01     84.8±2.43µs        ? ?/sec    1.00     84.2±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    109.3±0.70µs        ? ?/sec    1.00    109.7±0.89µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.01    177.0±1.97µs        ? ?/sec    1.00    175.6±1.77µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    114.4±0.98µs        ? ?/sec    1.01    115.8±2.53µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    147.1±0.69µs        ? ?/sec    1.02    149.4±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.00    191.4±1.97µs        ? ?/sec    1.01    194.1±0.77µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    151.5±0.84µs        ? ?/sec    1.02    155.2±1.78µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     88.8±1.60µs        ? ?/sec    1.15    102.2±3.13µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.00    158.7±1.38µs        ? ?/sec    1.05    166.9±2.67µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     92.9±1.32µs        ? ?/sec    1.15    107.0±1.07µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     39.7±0.77µs        ? ?/sec    1.18     46.8±2.01µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    135.6±0.94µs        ? ?/sec    1.01    137.1±1.66µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     43.9±0.87µs        ? ?/sec    1.20     52.7±1.08µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.03     85.0±0.93µs        ? ?/sec    1.00     82.1±1.99µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00    104.0±0.67µs        ? ?/sec    1.00    103.7±2.29µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.03     90.1±2.97µs        ? ?/sec    1.00     87.5±1.53µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.05    115.5±0.55µs        ? ?/sec    1.00    110.1±1.58µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.01    173.3±2.26µs        ? ?/sec    1.00    172.5±2.58µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.05    120.5±1.03µs        ? ?/sec    1.00    115.3±2.32µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     35.8±0.34µs        ? ?/sec    1.02     36.3±0.36µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.00    131.4±2.58µs        ? ?/sec    1.03    135.1±2.89µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     40.2±0.20µs        ? ?/sec    1.02     41.0±0.19µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.05    103.1±0.82µs        ? ?/sec    1.00     98.1±1.71µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    169.7±2.52µs        ? ?/sec    1.00    169.7±4.49µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.04    107.7±0.55µs        ? ?/sec    1.00    103.2±1.49µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     30.4±0.29µs        ? ?/sec    1.00     30.4±0.45µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    130.2±2.23µs        ? ?/sec    1.01    131.6±2.47µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     35.3±0.61µs        ? ?/sec    1.00     35.3±0.36µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.3±0.07ms        ? ?/sec    1.06      7.8±0.07ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     13.6±0.24ms        ? ?/sec    1.05     14.3±0.30ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.00   512.2±14.98µs        ? ?/sec    1.03    525.3±9.94µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.04   687.8±14.54µs        ? ?/sec    1.00   663.8±14.26µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.00    503.8±4.09µs        ? ?/sec    1.01    508.6±4.51µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.00    688.9±7.50µs        ? ?/sec    1.02   701.5±24.17µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.02    824.1±4.58µs        ? ?/sec    1.00    808.4±6.80µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.00    695.2±5.89µs        ? ?/sec    1.02   709.3±12.99µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.00    333.9±1.60µs        ? ?/sec    1.05    349.8±4.57µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    376.4±5.73µs        ? ?/sec    1.11    417.9±4.14µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.00    339.9±2.17µs        ? ?/sec    1.05    356.1±3.86µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    231.8±3.35µs        ? ?/sec    1.23    284.2±2.49µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    220.3±4.16µs        ? ?/sec    1.20    265.2±7.30µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    246.7±8.07µs        ? ?/sec    1.15    282.6±2.64µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.09   521.6±82.60µs        ? ?/sec    1.00    477.5±8.08µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.00    358.1±6.35µs        ? ?/sec    1.05    374.6±3.25µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.05   509.4±10.80µs        ? ?/sec    1.00    486.2±9.25µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.02     94.8±0.41µs        ? ?/sec    1.00     92.9±0.45µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    111.6±1.60µs        ? ?/sec    1.00    111.5±1.55µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.01     97.3±1.00µs        ? ?/sec    1.00     95.9±1.57µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.02    129.4±1.39µs        ? ?/sec    1.00    127.4±0.84µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    185.6±3.85µs        ? ?/sec    1.00    185.9±0.80µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.01    134.1±0.67µs        ? ?/sec    1.00    133.4±3.16µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     43.8±0.37µs        ? ?/sec    1.00     44.0±0.20µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.00    139.2±0.73µs        ? ?/sec    1.02    141.6±1.83µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.00     48.0±0.24µs        ? ?/sec    1.01     48.6±0.41µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.05    110.7±0.72µs        ? ?/sec    1.00    105.8±1.37µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.01    176.9±0.69µs        ? ?/sec    1.00    175.5±1.34µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.05    115.9±2.05µs        ? ?/sec    1.00    110.5±0.72µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.00     37.6±0.15µs        ? ?/sec    1.01     38.1±0.27µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.00    137.4±5.05µs        ? ?/sec    1.01    138.7±2.27µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     42.5±0.39µs        ? ?/sec    1.01     43.0±0.21µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.02     85.5±0.42µs        ? ?/sec    1.00     83.8±1.05µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00    101.6±0.49µs        ? ?/sec    1.01    102.6±1.58µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.01     87.3±0.33µs        ? ?/sec    1.00     86.7±1.53µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.00    110.0±1.24µs        ? ?/sec    1.00    110.5±1.26µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    165.5±0.94µs        ? ?/sec    1.01    167.4±0.81µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.01    115.1±1.75µs        ? ?/sec    1.00    114.4±1.10µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     23.2±0.29µs        ? ?/sec    1.13     26.1±0.27µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.00   123.9±13.61µs        ? ?/sec    1.00    124.3±2.59µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     27.8±0.47µs        ? ?/sec    1.12     31.1±0.28µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.04     92.8±0.98µs        ? ?/sec    1.00     89.2±2.95µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    157.1±0.71µs        ? ?/sec    1.01    158.4±1.96µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.03     96.7±0.54µs        ? ?/sec    1.00     94.3±0.79µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.00     19.5±0.74µs        ? ?/sec    1.10     21.5±0.67µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00    119.6±2.38µs        ? ?/sec    1.02    122.1±1.43µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     24.6±0.70µs        ? ?/sec    1.11     27.2±0.78µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.01     82.0±0.79µs        ? ?/sec    1.00     81.4±0.52µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00    102.6±1.40µs        ? ?/sec    1.01    103.6±2.23µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     84.6±1.96µs        ? ?/sec    1.00     84.3±1.05µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    109.5±0.58µs        ? ?/sec    1.00    109.5±0.87µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    175.8±1.62µs        ? ?/sec    1.00    175.5±3.71µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    114.1±0.82µs        ? ?/sec    1.02    116.0±1.87µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.00    147.3±1.18µs        ? ?/sec    1.01    148.4±0.65µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.00    191.7±1.99µs        ? ?/sec    1.01    194.1±2.14µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.00    152.2±1.26µs        ? ?/sec    1.01    153.8±0.54µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     88.6±1.17µs        ? ?/sec    1.15    102.3±1.26µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.00    160.3±9.93µs        ? ?/sec    1.04    166.0±0.59µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     92.9±0.50µs        ? ?/sec    1.15    107.0±1.49µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.00     38.8±0.67µs        ? ?/sec    1.15     44.7±2.38µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.00    134.6±5.15µs        ? ?/sec    1.03    138.3±3.47µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.00     43.8±0.70µs        ? ?/sec    1.24     54.1±1.98µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.03     91.4±3.76µs        ? ?/sec    1.00     88.9±0.58µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    106.7±0.95µs        ? ?/sec    1.00    107.1±1.45µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.01     92.9±0.47µs        ? ?/sec    1.00     91.9±1.01µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.02    121.3±0.77µs        ? ?/sec    1.00    119.4±1.37µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    176.5±3.10µs        ? ?/sec    1.01    177.8±4.92µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.01    125.7±2.03µs        ? ?/sec    1.00    124.2±1.58µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     35.8±0.53µs        ? ?/sec    1.00     35.9±0.21µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.00    133.2±1.26µs        ? ?/sec    1.01    134.4±1.46µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     40.1±0.27µs        ? ?/sec    1.02     40.9±0.38µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.06    102.9±0.44µs        ? ?/sec    1.00     97.2±0.39µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    169.0±0.77µs        ? ?/sec    1.00    168.8±0.80µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.04    107.8±1.15µs        ? ?/sec    1.00    103.4±0.72µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.00     30.0±0.79µs        ? ?/sec    1.01     30.2±0.21µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.00    128.9±1.16µs        ? ?/sec    1.02    131.6±2.02µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     34.6±0.25µs        ? ?/sec    1.02     35.2±0.16µs        ? ?/sec

alamb-ghbot · 2026-01-05T13:26:05Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing cleanup-casting (b0bd4a4) to 843bee2 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=cleanup-casting
Results will be posted here when complete

alamb-ghbot · 2026-01-05T13:50:28Z

🤖: Benchmark completed

Details

group                                cleanup-casting                        main
-----                                ---------------                        ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.2±0.58ms        ? ?/sec    1.02     14.5±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     15.9±0.52ms        ? ?/sec    1.02     16.3±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.01     27.6±0.73ms        ? ?/sec    1.00     27.3±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.02     32.9±0.48ms        ? ?/sec    1.00     32.4±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     30.2±0.34ms        ? ?/sec    1.00     29.9±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.01      5.8±0.17ms        ? ?/sec    1.00      5.7±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    116.1±1.24ms        ? ?/sec    1.00    116.5±1.05ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    134.8±1.52ms        ? ?/sec    1.00    134.2±0.97ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    290.8±6.68ms        ? ?/sec    1.01    292.4±9.28ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.01    415.3±3.02ms        ? ?/sec    1.00    411.1±3.43ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.03     36.3±0.82ms        ? ?/sec    1.00     35.4±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.01    102.1±0.63ms        ? ?/sec    1.00    101.0±1.32ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01    100.7±0.53ms        ? ?/sec    1.00     99.4±0.83ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.01     32.7±0.73ms        ? ?/sec    1.00     32.3±0.57ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.01    111.4±1.01ms        ? ?/sec    1.00    110.3±1.35ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     87.1±0.55ms        ? ?/sec    1.00     87.1±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.6±0.30ms        ? ?/sec    1.00     32.8±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     45.9±0.96ms        ? ?/sec    1.02     46.8±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     27.6±0.71ms        ? ?/sec    1.02     28.1±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     21.8±0.71ms        ? ?/sec    1.03     22.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     10.3±0.26ms        ? ?/sec    1.01     10.4±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.9±0.09ms        ? ?/sec    1.07     10.6±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.6±0.15ms        ? ?/sec    1.04     12.1±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     35.1±0.64ms        ? ?/sec    1.05     37.0±1.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.03     50.2±1.62ms        ? ?/sec    1.00     48.9±2.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.06     46.1±0.82ms        ? ?/sec    1.00     43.7±0.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.5±0.05ms        ? ?/sec    1.02      4.6±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    178.2±1.49ms        ? ?/sec    1.01    180.5±1.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    228.5±1.61ms        ? ?/sec    1.06    242.8±2.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    484.3±3.46ms        ? ?/sec    1.02    494.6±5.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    432.3±5.88ms        ? ?/sec    1.01   435.5±14.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     47.3±0.70ms        ? ?/sec    1.00     47.0±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.01    158.3±1.46ms        ? ?/sec    1.00    156.3±1.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    152.1±1.37ms        ? ?/sec    1.00    151.1±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     32.0±0.46ms        ? ?/sec    1.02     32.7±0.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.01    145.9±1.28ms        ? ?/sec    1.00    145.0±1.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     80.3±1.06ms        ? ?/sec    1.00     80.2±0.86ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     25.4±0.33ms        ? ?/sec    1.00     25.3±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     32.8±0.41ms        ? ?/sec    1.02     33.5±0.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     24.7±0.35ms        ? ?/sec    1.00     24.8±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     19.9±0.43ms        ? ?/sec    1.00     20.0±0.43ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00      9.4±0.11ms        ? ?/sec    1.00      9.4±0.12ms        ? ?/sec

alamb · 2026-01-05T16:56:53Z

My analysis of benchmarks is that there is no difference in performance

tustvold · 2026-01-05T17:38:40Z

Do you think this needs to wait for a breaking release, it is technically a breaking change, but only for a very new API that I suspect few people are using.

alamb · 2026-01-05T18:36:05Z

Do you think this needs to wait for a breaking release, it is technically a breaking change, but only for a very new API that I suspect few people are using.

I am not sure -- I need to review it in more detail first. I will do so later today

alamb

I think this PR makes sense to me -- I had some comments, etc but I also think it could be merged as is. Thank you @tustvold

I personally think we should wait to merge this for a major release to give it some bake / downstream testing time -- given I hope to release 57.2.0 later this week, that would mean waiting until next week, which might be too long

Thank you for the help reviewing it @paleolimbot

alamb · 2026-01-05T21:37:39Z

parquet/src/arrow/array_reader/primitive_array.rs


 impl IntoBuffer for Vec<Int96> {
    fn into_buffer(self, target_type: &ArrowType) -> Buffer {
+        let mut builder = Vec::with_capacity(self.len());


It probably doesn't matter but it might be faster to just collect the Vec directly

let builder: Vec::<i64> = match target_type { ArrowType::Timestamp(TimeUnit::Second, _) => { self.iter().map(|x| x.to_seconds()).collect() } ArrowType::Timestamp(TimeUnit::Millisecond, _) => { self.iter().map(|x| x.to_millis()).collect() } ArrowType::Timestamp(TimeUnit::Microsecond, _) => { self.iter().map(|x| x.to_micros()).collect() } ArrowType::Timestamp(TimeUnit::Nanosecond, _) => { self.iter().map(|x| x.to_nanos()).collect() } _ => unreachable!("Invalid target_type for Int96."), } Buffer::from_vec(builder) }

alamb · 2026-01-05T21:39:23Z

parquet/src/arrow/array_reader/primitive_array.rs

-                }
-                _ => unreachable!("INT96 must be a timestamp."),
-            },
+        let array: ArrayRef = make_array(array_data);


this is probably one way your PR ends up removing net lines of code

alamb · 2026-01-05T21:47:10Z

parquet/src/arrow/array_reader/primitive_array.rs


+/// Coerce the parquet physical type array to the target type
+///
+/// This should match the logic in schema::primitive::apply_hint


Can you also please add the rationale why this doesn't use the cast kernel (and instead reimplements some sort of it) as a comment? As I understand it, the issue is that

We are trying to keep dependencies down

The semantics of casting certain types are different

alamb · 2026-01-05T21:48:29Z

parquet/src/arrow/array_reader/primitive_array.rs

+/// This should match the logic in schema::primitive::apply_hint
+fn coerce_array(array: ArrayRef, target_type: &ArrowType) -> Result<ArrayRef> {
+    if let ArrowType::Dictionary(key_type, value_type) = target_type {
+        let dictionary = pack_dictionary(key_type, array.as_ref())?;


does this imply that we lose the ability to read DictionaryArrays directly (without unpacking them first)? Or is this the fallback in case the data wasn't actually dictionary encoded (e.g. it was encoded using plain encoding but the user asked for Dictionary)?

It is used as a fallback when the user requests dictionary encoding but the data isn't dictionary encoded. This can occur in a number of ways, but the two common ways are if the dictionary spilled and so you have plain encoded pages, or you are decoding a RecordBatch across a RowGroup boundary.

alamb · 2026-01-05T21:50:55Z

parquet/src/arrow/array_reader/primitive_array.rs

+        // follow C++ implementation and use overflow/reinterpret cast from i64 to u64 which will map
+        // `i64::MIN..0` to `(i64::MAX as u64)..u64::MAX`
+        ArrowType::UInt64 => Arc::new(UInt64Array::new(
+            array.values().inner().clone().into(),


I double checked that this is a clone of the Buffer (so relatively simple)

I wonder if can avoid all these Arc new / clones / etc by passing in the ArrayDataBuilder directly rather than a &Int64Array)

Something like

let builder = array.into_data().into_builder(); ... builder = builder.data_type(DurationSecondType); ... make_array(builder.build()?)

Calling array.into_data() will actually perform significantly more allocations, not to mention having significant additional dispatch overheads

I am talking about into_data which takes an owned object, not the confusingly similarly named to_data which does requires a copy / cloning.

I actually saw the allocations (especially for StringViewArray and StructArray) appear in some profiling and I have a WIP to remove them (I want to break it into smaller PRs for easier review):

Avoid clones while creating Arrays from ArrayData (speed up reading from Parquet reader) #9058

IIRC into_data still allocates, as the buffers are stored in a Vec. In general ArrayData is normally best avoided, it is not free.

Ah yes, you are right -- in my other PR the ArrayData was already created, and then make_array is copying it again (along with its allocations) in several cases.

What I found is that some arrays also have Vecs in them (like StructArray and BinaryViewArray)

alamb · 2026-01-05T21:54:52Z

parquet/src/arrow/arrow_writer/mod.rs

+                match leaf.as_any_dictionary_opt() {
+                    Some(dictionary) => {
+                        let materialized =
+                            arrow_select::take::take(dictionary.values(), dictionary.keys(), None)?;


not related to this PR necessairly, but this seems to imply we are expanding out previously encoded dictionary data, rather than reusing the dictionary. Is that correct? I realize that we would hhave to handle the case when there are different dictionaries across batches, but it seems like a potential optimization worth considering

That is correct, there is likely a fair amount of low-hanging fruit to optimising the parquet writer

Edit: Although it does appear there is something to handle dictionary array of bytearray, let me check I didn't break that

There is a separate specialized encoded for ByteArray and DictionaryArray of ByteArray - #2221

So this will only materialize dictionaries for primitives, which tbh is probably the most efficient thing to do.

alamb · 2026-01-05T21:59:16Z

parquet/src/arrow/buffer/dictionary_buffer.rs

                };
                let values = if let ArrowType::FixedSizeBinary(size) = **value_type {
-                    arrow_cast::cast(&values, &ArrowType::FixedSizeBinary(size)).unwrap()
+                    let binary = values.as_binary::<i32>();


we could avoid these clones here too using the into_builder and make_array stuff too

alamb · 2026-01-05T21:59:29Z

parquet/Cargo.toml

 [dependencies]
 arrow-array = { workspace = true, optional = true }
 arrow-buffer = { workspace = true, optional = true }
-arrow-cast = { workspace = true, optional = true }


tustvold · 2026-01-05T22:02:18Z

that would mean waiting until next week, which might be too long

That's perfectly fine, this isn't blocking anything on my end - I can just use a git pin. I just wanted to avoid this potentially bit-rotting, a week is perfectly fine 😄

alamb · 2026-01-09T16:07:01Z

I think main is open for breaking changes now so we can merge this one in

Cleanup parquet casting

ef215f2

github-actions bot added the parquet Changes to the parquet crate label Dec 31, 2025

tustvold commented Dec 31, 2025

View reviewed changes

Cleanup dictionary writing

75cca82

tustvold commented Dec 31, 2025

View reviewed changes

Avoid unnecessary Date64 cast

abd9c4b

tustvold marked this pull request as draft December 31, 2025 17:16

tustvold added 4 commits December 31, 2025 18:16

Make casting explicit in PrimitiveArrayReader

40acd4c

Rework writer

968e7f7

Add dictionary packing

ed43f81

Remove broken test

f2831a3

tustvold changed the title ~~Cleanup parquet casting~~ Remove parquet arrow_cast dependency Dec 31, 2025

tustvold mentioned this pull request Dec 31, 2025

Support writing GeospatialStatistics in Parquet writer #8524

Merged

Clippy

b0bd4a4

tustvold commented Dec 31, 2025

View reviewed changes

tustvold added the api-change Changes to the arrow API label Dec 31, 2025

tustvold marked this pull request as ready for review December 31, 2025 20:15

alamb mentioned this pull request Jan 5, 2026

Andrew Lamb Weekly-ish Open Source plan - 2026-01-05 apache/datafusion#19652

Open

36 tasks

alamb approved these changes Jan 5, 2026

View reviewed changes

tustvold mentioned this pull request Jan 6, 2026

feat(parquet): relax type compatility check in parquet ArrowWriter #9099

Merged

alamb added the next-major-release the PR has API changes and it waiting on the next major version label Jan 6, 2026

alamb added 2 commits January 7, 2026 13:56

Merge branch 'main' into cleanup-casting

b2216d4

Merge branch 'main' into cleanup-casting

1bfb373

alamb merged commit 266965b into apache:main Jan 9, 2026
17 checks passed

	let schema = parquet_schema_geometry();
	let props = WriterProperties::builder()
	.set_statistics_enabled(EnabledStatistics::Chunk)
	.build();
	let options = ArrowWriterOptions::new()
	.with_parquet_schema(schema)
	.with_properties(props);

Remove parquet arrow_cast dependency #9077

Remove parquet arrow_cast dependency #9077

Conversation

tustvold commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jan 5, 2026

Uh oh!

alamb commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb-ghbot commented Jan 5, 2026

Uh oh!

alamb commented Jan 5, 2026

Uh oh!

tustvold commented Jan 5, 2026

Uh oh!

alamb commented Jan 5, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tustvold commented Dec 31, 2025 •

edited

Loading

tustvold commented Dec 31, 2025 •

edited

Loading

tustvold Jan 5, 2026 •

edited

Loading

tustvold Jan 5, 2026 •

edited

Loading