-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: increase ROUND decimal precision to prevent overflow truncation #19926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Other options considered:
I chose option 1 as the most user friendly, I would like to hear if there are other suggestion for this fix. |
Option 1 sounds good. Option 2 would be too strict, not sure I fully understand what option 3 means here. One thing to keep in mind is we still need to consider edge case where precision is already maxed out. Would this bug occur again? |
You were right to point that, i did some more test and found out it was overflowing after 38 digits, I handled that as an error. |
|
|
||
| // Validate the result fits within the precision | ||
| // The max value for a given precision is 10^precision - 1 | ||
| let max_value = ten.pow_checked(precision as u32).map_err(|_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use these constants instead
- https://docs.rs/arrow/latest/arrow/datatypes/constant.MAX_DECIMAL128_FOR_EACH_PRECISION.html
- https://docs.rs/arrow/latest/arrow/datatypes/constant.MIN_DECIMAL128_FOR_EACH_PRECISION.html
(Same constants exist for each decimal type)
| round('1234.5678'::decimal(50,4), 2); | ||
| ---- | ||
| Decimal256(50, 4) 1234.57 | ||
| Decimal256(51, 4) 1234.57 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask whether the increase of the precision could be done only when really needed.
And I went checking what the other DBs do:
- Postgres v18
postgres=# SELECT pg_typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
pg_typeof | round
-----------+-------
numeric | 1000
(1 row)
(Not helpful)
- Apache Spark (v4.0.1)
spark-sql (default)> SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
decimal(4,0) 1000
Time taken: 0.032 seconds, Fetched 1 row(s)
Reduced the scale!
- DuckDB
SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
┌────────────────────────────────────────────┬────────────────────────────────────┐
│ typeof(round(CAST(999.9 AS DECIMAL(4,1)))) │ round(CAST(999.9 AS DECIMAL(4,1))) │
│ varchar │ decimal(4,0) │
├────────────────────────────────────────────┼────────────────────────────────────┤
│ DECIMAL(4,0) │ 1000 │
└────────────────────────────────────────────┴────────────────────────────────────┘
Reduced the scale!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice, Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Case | Input | Result | Type |
|---|---|---|---|
| dp column | 123.4567, dp=1 | 123.5000 | DECIMAL(10,4) |
| carry-over | 999.9999, dp=0 | 1000.0000 | DECIMAL(10,4) |
| constant dp | 999.9 | 1000 | DECIMAL(4,0) |
| scale reduction | 1234.5678, dp=2 | 1234.57 | DECIMAL(50,2) |
| negative dp | 12345.67, dp=-2 | 12300 | DECIMAL(7,0) |
| dp > scale | 123.4, dp=5 | 123.4 | DECIMAL(4,1) |
| negative values | -999.9, dp=0 | -1000 | DECIMAL(4,0) |
| // Get decimal_places from scalar_arguments | ||
| // If dp is not a constant scalar, we must keep the original scale because | ||
| // we can't determine a single output scale for varying per-row dp values. | ||
| let (decimal_places, dp_is_scalar): (i32, bool) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here decimal_places is initialized as i32 but below it is casted to i8 with as i8 and this may lead to problems. A validation is needed that it is not bigger than i8::MAX
| Ok(Arc::new(Field::new( | ||
| self.name(), | ||
| return_type, | ||
| input_field.is_nullable(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also take into account the decimal_places arg.
Postgres:
postgres=# SELECT pg_typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
pg_typeof | round
-----------+-------
numeric |
(1 row)
Apache Spark:
spark-sql (default)> SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
decimal(4,0) NULL
Time taken: 0.055 seconds, Fetched 1 row(s)
DuckDB:
D SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
┌────────────────────────────────────────────┬──────────────────────────────────────────┐
│ typeof(round(CAST(999.9 AS DECIMAL(4,1)))) │ round(CAST(999.9 AS DECIMAL(4,1)), NULL) │
│ varchar │ int32 │
├────────────────────────────────────────────┼──────────────────────────────────────────┤
│ DECIMAL(4,0) │ NULL │
└────────────────────────────────────────────┴──────────────────────────────────────────┘
| if args.scalar_arguments.len() > 1 { | ||
| match args.scalar_arguments[1] { | ||
| Some(ScalarValue::Int32(Some(v))) => (*v, true), | ||
| Some(ScalarValue::Int64(Some(v))) => (*v as i32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use a safe cast and return an Err if the number is bigger than i32::MAX (or even i8::MAX)
| decimal_places: Option<ArrayRef>, | ||
| ) -> Result<ArrayRef, DataFusionError> { | ||
| let number_rows = value.len(); | ||
| let return_type = value.data_type().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be wrong for decimals which scale is reduced.
No need to fix/improve it. Just a comment acknowledging that is enough.
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to take a look at this again soon; in the meantime could we update the PR body since it seems outdated compared to the recent changes?
| None => (0, true), // No dp argument means default to 0 | ||
| Some(None) => (0, false), // dp is a column | ||
| Some(Some(ScalarValue::Int32(Some(v)))) => (*v, true), | ||
| Some(Some(ScalarValue::Int64(Some(v)))) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have i64 handling code since the signature expects only i32 for the 2nd argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Int64 handling isn’t “accepting Int64 dp” at runtime; it’s handling how literals are represented during planning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow; does that mean scalar_arguments types don't have type coercion applied to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ReturnFieldArgs.scalar_arguments are taken directly from the original Expr::Literal values, so they can be ScalarValue::Int64 even though the function signature/coercion will ensure the argument is evaluated as Int32 at runtime.
Type coercion is applied when building the physical expression / evaluating the arguments, but scalar_arguments is just “what literal was written in the query”, not the coerced value. That’s why we handle Int64 there: to (a) treat it as a constant dp for return-type computation and (b) produce a clear out-of-range error for literals that can’t fit in i32.
I will update it. Thanks! |
Which issue does this PR close?
rounddecimal incorrect result #19921.Rationale for this change
SELECT ROUND('999.9'::DECIMAL(4,1))incorrectly returned100.0instead of1000.When rounding a decimal value causes a carry-over that increases the number of digits (e.g., 999.9 → 1000.0), the result overflows the original precision constraint. The overflow was silently truncated during display, producing incorrect results.
What changes are included in this PR?
precision for the integer part.
value can’t fit, it returns an overflow error instead of a wrong value.
Are these changes tested?
Are there any user-facing changes?