-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Problem Statement
Implicit type coercion is missing from the current PPL validation. It is useful in at least two scenarios:
- when the actual argument types of a function don't match with its expected types (should coerce to expected param type)
- when the two operand types of a comparator are different (should coerce to a common type)
Relevant issue: #3761
Status update:
- Approach 1: Implement implict cast and type validation with Calcite yuancu/sql-plugin#6 Discarded due to the re-introduction of
SqlNode(Update: picked up with Use Calcite's validation system for type checking & coercion #4892) - Approach 2 (alternative proposal): Support function argument coercion with Calcite #3914 Merged
Current State
As a result, the current function implementation must have an override for each parameter type it supports, even when they could be implicitly cast. For example, a function foo that accepts a date as an argument should also accept "2020-12-10" as an argument. However, without implicit type coercion, we have to define both foo(date) and foo(string).
How Calcite Implements Implicit Type Coercion
Calcite checks operand types and attempts to coerce them if necessary during the validation phase.
For example, in FamilyOperandTypeChecker, when an argument type doesn't match expectations, Calcite tries to determine if the argument can be coerced to the expected type. The simplified code snippet below illustrates this process:
public boolean checkOperandTypes(
SqlCallBinding callBinding,
boolean throwOnFailure){
...
if (!checkSingleOperandType(callBinding, op.e, op.i, families.get(op.i), false)) {
// try to coerce type if it is allowed.
boolean coerced = false;
if (callBinding.isTypeCoercionEnabled()) {
TypeCoercion typeCoercion = callBinding.getValidator().getTypeCoercion();
ImmutableList.Builder<RelDataType> builder = ImmutableList.builder();
for (int i = 0; i < callBinding.getOperandCount(); i++) {
builder.add(callBinding.getOperandType(i));
}
ImmutableList<RelDataType> dataTypes = builder.build();
coerced = typeCoercion.builtinFunctionCoercion(callBinding, dataTypes, families);
}
}In this process, builtinFunctionCoercion calls coerceOperandType to rewrite the operands when needed. The coerceOperandType method rewrites the query by wrapping the argument with a CAST call:
protected boolean coerceOperandType(SqlValidatorScope scope, SqlCall call, int index, RelDataType targetType) {
...
RelDataType targetType1 = syncAttributes(operandType, targetType);
SqlNode desired = castTo(operand, targetType1);
call.setOperand(index, desired);
updateInferredType(desired, targetType1);
return true;
}To determine whether an argument can be cast to the target type, Calcite defines rules in
SqlTypeCoercionRule and SqlTypeAssignmentRule. Additionally, the TypeCoercion interface defines how a type should be cast to another.
Challenges in Applying Calcite's Type Coercion
The validation phase of Calcite SQL validates a parsed SqlNode AST. It resolves identifiers, derives return types, validates argument types, etc. However, since we created a new language interface outside of SQL, this phase isn't directly applicable and is skipped in our current implementation.
As a result, we can't directly utilize Calcite's type coercion mechanism.
Proposal
We faced a similar issue with function argument type checking. We implemented our own argument validation initially. However, later we discovered that other language interfaces—like calcite-piglet—successfully leverage Calcite's type validation. Update: with a deeper inspection, I found that piglet actually uses its own type check, without re-using Calcite's type checking mechanism at all.
Calcite-piglet translates its logical plans into a SQL AST (represented with SqlNode) to reuse Calcite's type checking. It first constructs its own AST via a custom parser, then converts each node into a corresponding SqlNode. This approach allows it to access Calcite's standard validation flow and leverage its built-in type checking. The key implementation is in PigRelToSqlConverter.
I propose we adopt a similar approach by converting PPL to However, we could still try doing so. This should eliminate many future issues by reusing more of Calcite's functionality.SqlNode.
My concern over this proposal
With this approach, we re-introduce the SqlNode layer. This contradicts our previous design, which skips the layer.
- Calcite's workflow
Loading
flowchart LR SQL -->|parse| SqlNode1[SqlNode] -->|validation| SqlNode2[SqlNode] -->|convert| RelNode1[RelNode] -->|optimize| RelNode2[RelNode]
- Our existing design
Loading
flowchart LR PPL -->|parse| AST -->|convert| RelNode1[RelNode] -->|optimize| RelNode2[RelNode]
- With this proposal (the blocks in blue are added phases)
Loading
flowchart LR PPL -->|parse| AST -->|convert| RelNode1[RelNode] -->|convert| SqlNode1[SqlNode]:::blue -->|validate| SqlNode2[SqlNode]:::blue -->|convert| RelNode2[RelNode]:::blue -->|optimize| RelNode3[RelNode] classDef blue fill:#2196F3,stroke:#0D47A1,color:white
It's like converting PPL to SQL for validation purpose, making people doubting why is the SqlNode phase skipped at the first place.
Approach
This solution requires the following steps:
- Convert PPL's logical plan to SqlNode by creating a similar
PplRelToSqlConverter- The handling of UDT needs more investigation
- Refactor the current argument type validation with the converted SQL AST
- Implement function type coercion based on this
Long-Term Goals
- Automatic coercion makes defining functions simpler and easier to maintain.
- Make use of more Calcite's validation mechanism to reduce maintenance effort and make the implementation more robust.
Alternative
flowchart LR
PPL -->|parse| AST -->|convert & cast| RelNode1[RelNode]:::blue -->|optimize| RelNode3[RelNode]
classDef blue fill:#2196F3,stroke:#0D47A1,color:white
- Make coercion directly at the
RelNodelevel- Merits: Aligns with our existing design
- Potential drawbacks: We make less use of what's already there in Calcite. It may leads to more investment in validation in the future. For example, we could have eliminated the effort to implement type checkers if we introduced the
SqlNodelayer.
Implementation Discussion
Metadata
Metadata
Assignees
Labels
Type
Projects
Status