Skip to content

Spark quarter function implementation#20808

Open
kazantsev-maksim wants to merge 15 commits intoapache:mainfrom
kazantsev-maksim:spark_quarter
Open

Spark quarter function implementation#20808
kazantsev-maksim wants to merge 15 commits intoapache:mainfrom
kazantsev-maksim:spark_quarter

Conversation

@kazantsev-maksim
Copy link
Contributor

Which issue does this PR close?

N/A

Rationale for this change

Add new spark function: https://spark.apache.org/docs/latest/api/sql/index.html#quarter

What changes are included in this PR?

  • Implementation
  • SLT tests

Are these changes tested?

Yes, tests added as part of this PR.

Are there any user-facing changes?

No, these are new function.

@kazantsev-maksim kazantsev-maksim marked this pull request as draft March 8, 2026 16:25
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Mar 8, 2026
@kazantsev-maksim kazantsev-maksim changed the title park quarter function implementation Spark quarter function implementation Mar 8, 2026
@kazantsev-maksim kazantsev-maksim marked this pull request as ready for review March 8, 2026 18:05
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @kazantsev-maksim,

Thanks for putting this together — the new quarter UDF is headed in a good direction, but I think there are a couple of compatibility and robustness issues we should address before merging.

The main blocker is that the current registration uses an exact Date32 signature, which looks narrower than Spark’s documented behavior for quarter. I also left a couple of smaller follow-ups around reusing the existing date_part("quarter", ...) path and avoiding the unwrap() panic path.

I’d also love to see coverage added for Spark’s documented string-literal example so we lock that compatibility in.

impl SparkQuarter {
pub fn new() -> Self {
Self {
signature: Signature::exact(vec![DataType::Date32], Volatility::Immutable),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main thing we should fix before merging. Right now the UDF is registered with an exact Date32 signature, which means we no longer preserve Spark’s documented call shape for quarter.

Spark’s SQL docs show SELECT quarter('2016-08-31'); returning 3, and this SLT file used to carry that example before it was replaced with explicit ::DATE casts. With the current signature, we only validate the casted form and could end up rejecting the plain string-literal case that Spark accepts.

Could we switch this to a coercible signature, or possibly just route through the existing date_part('quarter', ...) behavior, and add coverage for the uncasted query?

}

fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
let [arg] = take_function_args("quarter", args.args)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion here: this seems to repeat the same scalar/array Date32 dispatching pattern we already have in other datetime helpers, while datafusion_functions::datetime::date_part() already supports "quarter".

Would it make sense to delegate to that existing implementation instead? It feels like that would help keep coercion rules, null handling, and any future date-part behavior aligned in one place.

}

fn spark_quarter(days: i32) -> Result<i32> {
let quarter = Date32Type::to_naive_date_opt(days).unwrap().quarter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that made me a little nervous here is the unwrap(). That introduces a panic path if a malformed Date32 value ever makes it down to this helper.

last_day handles the same kind of conversion by returning an explicit error instead, which feels a bit safer for a public UDF. Could we do the same here so bad inputs show up as query errors rather than aborting execution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rework it

@kazantsev-maksim
Copy link
Contributor Author

Thanks for the review @kosiew. Could you please another see, when you have a time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants