Skip to content

Fix column-to-column comparison in WHERE clause#18587

Open
spapin wants to merge 2 commits into
apache:masterfrom
spapin:fix/column-to-column-comparison-rewrite
Open

Fix column-to-column comparison in WHERE clause#18587
spapin wants to merge 2 commits into
apache:masterfrom
spapin:fix/column-to-column-comparison-rewrite

Conversation

@spapin
Copy link
Copy Markdown

@spapin spapin commented May 26, 2026

Summary

Fixes #10600

The PredicateComparisonRewriter previously rewrote column-to-column comparisons like WHERE a = b into minus(a, b) = 0. This arithmetic rewrite forced DOUBLE coercion on both operands, which:

  • Broke string comparisons entirelyDouble.parseDouble("some string") throws NumberFormatException
  • Lost precision for integer types — large LONG values cast to DOUBLE can alias, making distinct values appear equal
  • Affected both v1 and v2 engines — the v2 leaf stage applies the same rewriter, so even multi-stage queries hit this bug

The fix replaces the minus() rewrite with the type-safe comparison transform functions (equals, not_equals, greater_than, etc.) that already exist in BinaryOperatorTransformFunction. These dispatch on the actual stored types of both operands — Integer.compare for INT, Long.compare for LONG, String.compareTo for STRING — with no forced coercion.

Why the rewrite is still needed

The v1 filter engine requires a literal on the RHS of predicates. The rewrite moves the column-to-column comparison into a transform function, leaving a literal true on the RHS:

Before (broken):  EQUALS(json_extract_scalar(...), other_col)          ← column on RHS, v1 can't evaluate
Old rewrite:      EQUALS(minus(json_extract_scalar(...), other_col), 0) ← forced DOUBLE coercion, breaks strings
New rewrite:      EQUALS(equals(json_extract_scalar(...), other_col), true) ← type-safe per-row comparison

Related issues

Changes

  • PredicateComparisonRewriter.javaa op bEQUALS(comparison_func(a, b), true) instead of EQUALS(minus(a, b), 0)
  • PredicateComparisonRewriterTest.java — covers all 6 comparison operators, function-on-LHS, operand preservation, and BETWEEN rejection
  • JsonExtractScalarTest.java — end-to-end integration tests for string and numeric column-to-column comparisons via json_extract_scalar

Test plan

  • PredicateComparisonRewriterTest — verifies rewrite structure for =, !=, <, >, >= and function-on-LHS
  • JsonExtractScalarTest#testJsonExtractScalarComparedToColumn — end-to-end across all 7 JSON column variants:
    • String = (no match → empty result, no error)
    • String = same expression both sides (all rows match)
    • String != (all rows satisfy)
    • Numeric = (no match → empty result, no error)
    • Numeric > (all rows satisfy)
  • Local quickstart: verified json_extract_scalar(properties, '$.hs_name', 'STRING', '') = expectedName returns correct matching rows

The PredicateComparisonRewriter previously rewrote column-to-column
comparisons like `WHERE a = b` into `minus(a, b) = 0`. This arithmetic
rewrite forced DOUBLE coercion on both operands, which threw
NumberFormatException for any non-numeric string values (e.g.,
json_extract_scalar with a STRING result type).

The fix replaces the minus() rewrite with the type-safe comparison
transform functions (equals, not_equals, greater_than, etc.) that
already exist in BinaryOperatorTransformFunction. These properly
dispatch on the stored types of both operands — STRING comparisons
use String.compareTo(), numeric comparisons use the appropriate
Number.compare(), with no forced coercion.

This fixes both v1 (single-stage) and v2 (multi-stage) engines,
since the v2 leaf stage applies the same rewriter.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

Codecov Report

❌ Patch coverage is 46.15385% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.83%. Comparing base (a6463d4) to head (3d1c402).
⚠️ Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
.../parsers/rewriter/PredicateComparisonRewriter.java 46.15% 6 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (a6463d4) and HEAD (3d1c402). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (a6463d4) HEAD (3d1c402)
java-21 5 4
unittests1 1 0
unittests 2 1
temurin 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18587       +/-   ##
=============================================
- Coverage     64.27%   36.83%   -27.45%     
+ Complexity     1137     1136        -1     
=============================================
  Files          3314     3335       +21     
  Lines        204064   205908     +1844     
  Branches      31760    32129      +369     
=============================================
- Hits         131165    75841    -55324     
- Misses        62373   123209    +60836     
+ Partials      10526     6858     -3668     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 36.83% <46.15%> (-27.45%) ⬇️
temurin 36.83% <46.15%> (-27.45%) ⬇️
unittests 36.82% <46.15%> (-27.45%) ⬇️
unittests1 ?
unittests2 36.82% <46.15%> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Existing tests in CalciteSqlCompilerTest still expected the old
minus(a, b) > 0 pattern; update to match the new
greater_than(a, b) = true rewrite. Also add missing <= coverage
in PredicateComparisonRewriterTest so all six comparison operators
are exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0
Copy link
Copy Markdown
Contributor

hmm I thought #10600 is already resolved... no?

@spapin
Copy link
Copy Markdown
Author

spapin commented May 27, 2026

hmm I thought #10600 is already resolved... no?

My understanding is that there was a fix in #10444 for comparing the same column.

WHERE columnA = columnA

This change also supports different columns

WHERE columnA = columnB

The ticket mentions the NumberFormatException which is rooted in the Minus operation that was previously used. Let me know if this was the wrong issue to tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server query with <column> = <column> for string column throws number format exception

3 participants