Skip to content

Comments

feat: [ANSI] Ansi sql error messages#3580

Draft
parthchandra wants to merge 11 commits intoapache:mainfrom
parthchandra:sql-query-errors
Draft

feat: [ANSI] Ansi sql error messages#3580
parthchandra wants to merge 11 commits intoapache:mainfrom
parthchandra:sql-query-errors

Conversation

@parthchandra
Copy link
Contributor

Which issue does this PR close?

Closes parts of #551
Closes #2215
Closes #3375

Rationale for this change

With Spark 4.0 (And Spark 3.5 with Ansi mode), Spark produces ansi compliant error messages that have an error code and in many cases include the original SQL query. When we encounter errors in native code, Comet throws a SparkException or CometNativeException that do not conform to the expected error reporting standard.

What changes are included in this PR?

This PR introduces a framework to report ansi compliant error messages from native code.

Summary of error propagation -

  1. Spark-side query context serialization : For every serialized expression and aggregate expression, a unique expr_id is generated. If the expression's origin carries a QueryContext (SQL text, line, column, object name), it is extracted and attached to the protobuf. This is done for both Expr and AggExpr.
  2. Native planner (planner.rs): The PhysicalPlanner now holds a QueryContextMap. When planning Expr and AggExpr nodes, if expr_id and query_context are present, the context is registered in the map. When creating physical expressions for Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and arithmetic binary expressions, the relevant QueryContext is looked up and passed to the constructor.
  3. Native errors : The SparkError enum is extended with new variants will all the Spark ANSI errors (from https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala). A new SparkErrorWithContext type wraps a SparkError with a QueryContext. All affected expression implementations look up the context and produce a SparkErrorWithContext when available.
    The SparkError implementation also has new to_json() and exception_class() methods for JNI serialization.
  4. JNI boundary (errors.rs -> CometQueryExecutionException): The throw_exception function now checks for SparkErrorWithContext or SparkError and throws CometQueryExecutionException. CometQueryExecutionException carries the entire SparkErrorWithContext as a JSON message. On the Scala side, CometExecIterator catches this exception and calls SparkErrorConverter.convertToSparkException() to convert to the appropriate Spark exception. If the JSON message contained the QueryContext, the exception will contain the query, otherwise it will not.
  5. There are two version specific implementations -one for Spark 3.x (fallback to generic SparkException) and one for Spark 4.0 (calls the exact QueryExecutionErrors.* methods).

Notes: Not all expressions have been updated. All the expressions that failed unit tests as a result of incorrect error messages have been updated. ( Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and binary arithmetic expressions). Binary arithmetic expressions are now represented as CheckedBinaryExpr which also includes the query context.
Most errors in QueryExecutionErrors are reproduced as is in the native side. However some errors like INTERVAL_ARITHMETIC_OVERFLOW have a version with a user suggestion and one without a user suggestion. In such cases there are two variants in the native side.

How are these changes tested?

New unit tests. Failing tests listed in #551, #2215, #3375

This PR was produced with the generous assistance of Claude Code

@parthchandra
Copy link
Contributor Author

@coderfender, fyi

@coderfender
Copy link
Contributor

Thank you @parthchandra . This is awesome

// context eagerly so it displays the call site at the
// line of code where the cast method was called, whereas spark grabs the context
// lazily and displays the call site at the line of code where the error is checked.
assert(sparkMessage.startsWith(cometMessage.substring(0, 40)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 40 just an arbitrary number to get a decent sized chunk of the error message or does it have a specific meaning in the context of this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 40 as an arbitrary number to get enough of the error message. There is no special significance.

case "BinaryArithmeticOverflow" =>
Some(
QueryExecutionErrors.binaryArithmeticCauseOverflowError(
params("value1").toString.toShort,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this overflow if the value is larger than a short?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that should not happen. The original Spark exception has these values declared as short.


// Extract the problematic fragment
let fragment = if start_idx < self.sql_text.len() && stop_idx <= self.sql_text.len() {
&self.sql_text[start_idx..stop_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier docs say that start_idx is a character index, but it is being used as a byte index here, I think. Perhaps you could tests for non-ASCII cases, if that makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed to use char index. Added a unit test

@parthchandra parthchandra marked this pull request as draft February 24, 2026 01:02
@parthchandra
Copy link
Contributor Author

Changed to draft to figure out backward compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ANSI mode array access error messages don't match Spark format [ANSI] Include original SQL in error messages

3 participants