fix: validate clause expressions during logical planning#20965
fix: validate clause expressions during logical planning#20965myandpr wants to merge 1 commit intoapache:mainfrom
Conversation
|
Hey @alamb , Could you please help trigger CI for this PR? This is my first contribution here. I’m hoping to contribute more actively to the project going forward, so I’d really appreciate your help. Thanks! |
d0cd543 to
75d220e
Compare
Signed-off-by: yaommen <myanstu@163.com>
75d220e to
0e226bd
Compare
| let predicate_type = expr.get_type(schema)?; | ||
| if !Self::is_allowed_predicate_type(&predicate_type) { | ||
| return plan_err!( | ||
| "Cannot create filter with non-boolean predicate '{expr}' returning {predicate_type}" |
There was a problem hiding this comment.
most DataFusion type validation happens during the Analyzer phase, not the sql planner (binding phase)
I agree with @Acfboy s original statement that failing at physical planning time is probably too late, but trying to do this in sql is too early
Among other things it means that this type checking will not apply to queries created via the DataFrame API
There was a problem hiding this comment.
Thanks @alamb , that makes sense.
I agree that doing this in the SQL planner makes the behavior SQL-specific and misses queries built through the DataFrame API.
I’ll rework this toward the analyzer/type coercion path instead, and I’ll take a closer look at which cases are already covered there versus which ones still need to be handled. I’ll also update the regression tests to match that direction.
Which issue does this PR close?
Rationale for this change
SELECT 1 + 'a'fails during logical planning, but equivalent invalid expressions inWHERE,HAVING,QUALIFY,ORDER BY, andJOIN ... ONwere still deferred until later phases.This PR fixes that inconsistency in the SQL planner. It only adds earlier validation for these SQL clauses, and does not change the analyzer/type coercion design for other planning paths.
What changes are included in this PR?
WHERE,HAVING,QUALIFY,ORDER BY, andJOIN ... ONSTARTS_WITH() IS NULLare also rejected during logical planningAre these changes tested?
Yes.
Added regression tests for:
(1 + 'a')(STARTS_WITH() IS NULL)These tests cover
WHERE,HAVING,QUALIFY,ORDER BY, andJOIN ... ON.Are there any user-facing changes?
Yes. Invalid expressions in these clauses now fail earlier during logical planning.