[Variant] Align cast logic for from/to_decimal for variant by klion26 · Pull Request #9689 · apache/arrow-rs

klion26 · 2026-04-10T08:26:48Z

Which issue does this PR close?

Closes Align cast logic for from/to_decimal for variant to cast kernel #9688 .

What changes are included in this PR?

Extract some logic in arrow-cast
Reuse the extracted logic in arrow-cast and parquet-variant

Are these changes tested?

Reuse the existing tests in arrow-test

Are there any user-facing changes?

Yes, changed the docs

klion26

@scovich @sdf-jkl Please help to review this when you're free, thanks

klion26 · 2026-04-10T08:33:33Z

parquet-variant/src/variant.rs

+                .ok()
+                .and_then(|x: i32| x.try_into().ok()),
+            Variant::ShortString(v) => {
+                parse_string_to_decimal_native::<Decimal32Type>(v.as_str(), 0usize)


Use v.as_str() instead of v because we use match *self, if we change to match self then we need to derefer self in other match arms, seems there is litter benefit gained, so stick to the current match *self and use v.as_str() here.

parquet-variant/src/variant.rs

scovich

General approach looks reasonable, but needs some tweaks to avoid regressing performance. Do we have benchmarks we can throw at this to verify?

arrow-cast/src/cast/decimal.rs

scovich · 2026-04-10T16:17:30Z

arrow-cast/src/cast/decimal.rs

+                    let v = cast_single_decimal_to_integer::<D, T::Native>(
+                        array.value(i),
+                        div,
+                        scale as _,


Why are we casting? Isn't it a trivial i16 -> i16 cast?
(again below)

scovich · 2026-04-10T16:26:16Z

arrow-cast/src/cast/decimal.rs

+        } else {
+            match cast_options.safe {
+                true => {
+                    let v = cast_single_decimal_to_integer::<D, T::Native>(


The original code hoisted checks for scale < 0 (mul_checked vs. div_checked) and cast_options.safe (NULL vs. error) outside the loop, producing four simple loop bodies. This was presumably done for performance reasons (minimizing branching inside the loop).

The new code pushes the cast_options.safe check inside a single loop and pushes scale < 0 check all the way down inside cast_single_decimal_to_integer. That triples the number of branches inside the loop body (the null check is per-row and so is always stuck inside the loop). Performance will almost certainly be impacted, possibly significantly.

It would be safer to just preserve the replication (even tho it duplicates logic with the new helper), and rely on the compiler's inlining and "jump threading" optimizations to eliminate that redundancy:

code snippet

if scale < 0 { if cast_options.safe { for i in 0..array.len() { if array.is_null(i) { value_builder.append_null(); } else { let v = cast_single_decimal_to_integer::<D, T::Native>(...); value_builder.append_option(v.ok()); } } } else { for i in 0..array.len() { if array.is_null(i) { value_builder.append_null(); } else { let v = cast_single_decimal_to_integer::<D, T::Native>(...); value_builder.append_value(v?); } } } } else { if cast_options.safe { for i in 0..array.len() { if array.is_null(i) { value_builder.append_null(); } else { let v = cast_single_decimal_to_integer::<D, T::Native>(...); value_builder.append_option(v.ok()); } } } else { for i in 0..array.len() { if array.is_null(i) { value_builder.append_null(); } else { let v = cast_single_decimal_to_integer::<D, T::Native>(...); value_builder.append_value(v?); } } } }

If you wanted to simplify a bit, you could define and use a local macro inside this function:

// Helper macro for emitting nearly the same loop every time, so we can hoist branches out. // The compiler will specialize the resulting code (inlining and jump threading) macro_rules! cast_loop { (|$v:ident| $body:expr) => {{ for i in 0..array.len() { if array.is_null(i) { value_builder.append_null(); } else { let $v = cast_single_decimal_to_integer::<D, T::Native>(...); $body } } }}; } if scale < 0 { if cast_options.safe { cast_loop!(|v| value_builder.append_option(v.ok())); } else { cast_loop!(|v| value_builder.append_value(v?)); } } else { if cast_options.safe { cast_loop!(|v| value_builder.append_option(v.ok())); } else { cast_loop!(|v| value_builder.append_value(v?)); } }

Note that the four loop bodies are almost syntactically identical -- differing only in whether they append_option(v.ok()) or append_value(v?) -- but the inlined body of cast_single_decimal_to_integer inside each loop will be specialized based on the scale < 0 check we already performed. Result: stand-alone calls to the helper function are always safe, but we still get maximum performance here.

scovich · 2026-04-10T16:29:27Z

arrow-cast/src/cast/mod.rs

+    D: DecimalType,
+    F: Fn(D::Native) -> f64,
+{
+    f(x) / 10_f64.powi(scale)


fmul is drastically cheaper than fdiv on every architecture I know of. As in, 5-10x higher ops/second.
Any reason we shouldn't switch?

Suggested change

f(x) / 10_f64.powi(scale)

f(x) * 10_f64.powi(-scale)

scovich · 2026-04-10T16:34:47Z

arrow-cast/src/cast/mod.rs

        }),
        Float64 => cast_decimal_to_float::<D, Float64Type, _>(array, |x| {
-            as_float(x) / 10_f64.powi(*scale as i32)
+            single_decimal_to_float_lossy::<D, F>(&as_float, x, *scale as _)


If we're anyway changing the code, i32::from(*scale) makes clear that this is a lossless conversion

(a bunch more similarly lossless as _ below)

scovich · 2026-04-10T16:44:42Z

parquet-variant/src/variant.rs

+            Variant::Decimal16(d) => Self::cast_decimal_to_num::<Decimal128Type, T, _>(
+                d.integer(),
+                d.scale(),
+                |x: i128| x as f64,


I'm a bit surprised those type annotations are necessary when the first arg takes D::Native and should thus constrain the third arg's `F: fn(D::Native) -> f64?

scovich · 2026-04-10T16:46:05Z

parquet-variant/src/variant.rs

+            .map(|(_, frac)| frac.len())
+            .unwrap_or(0);


scovich · 2026-04-10T16:48:10Z

parquet-variant/src/variant.rs

+        parse_string_to_decimal_native::<D>(input, scale_usize)
+            .ok()
+            .and_then(|raw| VD::try_new(raw, scale).ok())


nit

Suggested change

parse_string_to_decimal_native::<D>(input, scale_usize)

.ok()

.and_then(|raw| VD::try_new(raw, scale).ok())

let raw = parse_string_to_decimal_native::<D>(input, scale_usize).ok()?;

VD::try_new(raw, scale).ok()

scovich · 2026-04-10T16:51:32Z

parquet-variant/src/variant.rs

        self.as_num()
    }

+    fn convert_string_to_decimal<VD, D>(input: &str) -> Option<VD>


nit: If you swap the template order, I callers would be a tad more readable, e.g.:

convert_string_to_decimal::<Decimal32Type, _>

scovich · 2026-04-10T16:53:25Z

parquet-variant/src/variant.rs

+                .as_num::<i64>()
+                .map(|x| (x as i128).try_into().ok())


Out of curiosity, why not just as_num::<i128> directly?
But if you must keep the double cast, at least do i128::from(x) to make clear it's lossless.

[Variant] Align cast logic for from/to_decimal for variant

c9a092e

github-actions bot added arrow Changes to the arrow crate parquet-variant parquet-variant* crates labels Apr 10, 2026

klion26 commented Apr 10, 2026

View reviewed changes

add some example for decimal from string

38dfe69

scovich reviewed Apr 10, 2026

View reviewed changes

	.map(\|(_, frac)\| frac.len())
	.unwrap_or(0);
	.map_or_else(0, \|(_, frac)\| frac.len());

	.map(\|(_, frac)\| frac.len())
	.unwrap_or(0);
	.map_or_default(\|(_, frac)\| frac.len());

Conversation

klion26 commented Apr 10, 2026

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

klion26 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants