Skip to content

[GLUTEN-12157][VL] Fix silently-skipped math/scalar test suites; add Velox native tests for sin, tan, tanh, radians, ln#12158

Draft
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:feat/register-sparksql-math-functions
Draft

[GLUTEN-12157][VL] Fix silently-skipped math/scalar test suites; add Velox native tests for sin, tan, tanh, radians, ln#12158
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:feat/register-sparksql-math-functions

Conversation

@brijrajk
Copy link
Copy Markdown

@brijrajk brijrajk commented May 27, 2026

What changes are proposed in this pull request?

MathFunctionsValidateSuite and ScalarFunctionsValidateSuite were left as
abstract class after PR #11756 (RAS removal) deleted their only concrete
subclasses, causing all tests in those suites to silently not run. Both are
promoted to concrete classes here, consistent with the fix already applied to
DateFunctionsValidateSuite in that same PR.

ANSI mode (enabled by default in Spark 4) wraps arithmetic expressions in ANSI
check nodes, shifting the top-level plan node away from ProjectExecTransformer.
MathFunctionsValidateSuite now explicitly disables ANSI mode; ANSI-specific
behaviour is covered by the existing MathFunctionsValidateSuiteAnsiOn.

Adds Scala integration tests for sin, tan, tanh, radians, and ln. The
ln test includes an edge-case check that ln(0) and ln(-1) return null
rather than -Infinity/NaN, matching Spark's Hive-inherited semantics.

The native Velox registrations for these five functions are provided by
facebookincubator/velox#17648.
UPSTREAM_VELOX_PR_ID=17648 is set in get-velox.sh to validate the full pipeline
on CI while that Velox PR is in flight. It will be cleared before merge once the
Velox change lands in Gluten.

Fixes #12157

How was this patch tested?

Tests in MathFunctionsValidateSuite use runQueryAndCompare with
checkGlutenPlan[ProjectExecTransformer] to verify native execution without fallback.
The ln edge-case test uses compareResultsAgainstVanillaSpark to explicitly validate
null semantics against vanilla Spark.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

@github-actions github-actions Bot added the VELOX label May 27, 2026
@n0r0shi
Copy link
Copy Markdown

n0r0shi commented May 28, 2026

Hi @brijrajk, apologies for the parallel work, I didn't see your PR was already up when I started. I had an upstream Velox fix in flight at the same time: facebookincubator/velox#17648.

Heads up on ln, independent of which path lands: prestosql::LnFunction calls std::log and returns -Infinity for ln(0) / NaN for ln(-1). Spark's UnaryLogExpression returns null in both cases (Hive heritage, check yAsymptote: in mathExpressions.scala).

So a change on Velox is indeed inevitable, the Velox PR adds a sparksql::LnFunction mirroring the existing sparksql::Log2Function for that reason. The other four(sin/tan/tanh/radians) are fine with the prestosql impls — no domain restriction, same as sinh/cos/cosh/ degrees already.

Your test-framework fixes (abstract classclass, ANSI disable) are independently valuable and should land regardless — those suites really are silently not running today.

Happy to coordinate whichever direction maintainers prefer.

@brijrajk brijrajk changed the title [GLUTEN-12157][VL] Register sin, tan, tanh, radians, ln in Velox sparksql function registry [GLUTEN-12157][VL] Register sin, tan, tanh, radians in Velox sparksql function registry May 28, 2026
@brijrajk
Copy link
Copy Markdown
Author

brijrajk commented May 28, 2026

Hi @brijrajk, apologies for the parallel work, I didn't see your PR was already up when I started. I had an upstream Velox fix in flight at the same time: facebookincubator/velox#17648.

Heads up on ln, independent of which path lands: prestosql::LnFunction calls std::log and returns -Infinity for ln(0) / NaN for ln(-1). Spark's UnaryLogExpression returns null in both cases (Hive heritage, check yAsymptote: in mathExpressions.scala).

So a change on Velox is indeed inevitable, the Velox PR adds a sparksql::LnFunction mirroring the existing sparksql::Log2Function for that reason. The other four(sin/tan/tanh/radians) are fine with the prestosql impls — no domain restriction, same as sinh/cos/cosh/ degrees already.

Your test-framework fixes (abstract classclass, ANSI disable) are independently valuable and should land regardless — those suites really are silently not running today.

Happy to coordinate whichever direction maintainers prefer.

Thanks for the detailed explanation and for flagging the parallel work — no worries at all. You're right on ln; I've dropped it from this PR so only the four semantically safe functions (sin, tan, tanh, radians) land here, along with the test-framework fixes.

Once your Velox PR (facebookincubator/velox#17648) lands upstream and Gluten picks up the new sparksql::LnFunction, ln can follow in a separate PR.

@brijrajk brijrajk force-pushed the feat/register-sparksql-math-functions branch from 6270b01 to 9aa5e1b Compare May 28, 2026 10:56

// Register math functions that are present in the prestosql implementation
// but not yet in the sparksql registry. These are semantically identical
// to Spark's behavior for the same names.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Would it make sense to refactor the Velox code, moving these functions to a common location and registering them in Spark's registerMathFunctions, provided we verify their implementations are identical?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would make sense. As @n0r0shi noted above, they were already working on registering these in Velox's sparksql::registerMathFunctions (facebookincubator/velox#17648) — including a proper sparksql::LnFunction for Spark's null semantics — when this PR went up.

Given that upstream work is in flight, I'd propose narrowing this PR to just the test-framework fixes (abstract class → class, ANSI disable) and the four Scala integration tests. Those are independently valuable since those suites are silently not running today, and the tests will validate native execution once the Velox changes land in Gluten.

Happy to drop the C++ registrations if that approach works for you.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brijrajk, thank you for the clarification. I was just aware of the above discussion. That makes sense.

Perhaps, you may set the Velox PR ID in the following position for verifying that PR on Gluten side.

UPSTREAM_VELOX_PR_ID=""

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — set UPSTREAM_VELOX_PR_ID=17648 in get-velox.sh so CI builds Gluten with facebookincubator/velox#17648 applied and validates the five Scala integration tests natively. Also added ln back with an edge-case check verifying ln(0) and ln(-1) return null. Will clear UPSTREAM_VELOX_PR_ID and mark the PR ready once that Velox change lands in Gluten.

@philo-he philo-he marked this pull request as draft May 29, 2026 21:42
@github-actions github-actions Bot added the BUILD label May 30, 2026
@brijrajk brijrajk force-pushed the feat/register-sparksql-math-functions branch from 2568c79 to 831b2f5 Compare May 30, 2026 01:57
@brijrajk
Copy link
Copy Markdown
Author

Thanks for marking it draft, @philo-he. The PR is now narrowed to the test-framework fixes and Scala integration tests, with UPSTREAM_VELOX_PR_ID=17648 wired for CI validation. Will mark ready once facebookincubator/velox#17648 lands.

@philo-he
Copy link
Copy Markdown
Member

philo-he commented May 30, 2026

Thanks for marking it draft, @philo-he. The PR is now narrowed to the test-framework fixes and Scala integration tests, with UPSTREAM_VELOX_PR_ID=17648 wired for CI validation. Will mark ready once facebookincubator/velox#17648 lands.

@brijrajk, this PR totally makes sense. Generally, the community recommends marking a PR as draft if it's not ready for review and merge. Thank you so much for the update!

I just triggered the CI (manual trigger is required for a new contributor's first PR). It looks like there are some Scala code formatting issues. You can run ./dev/format-scala-code.sh locally to apply the fix.

…Velox native tests for sin, tan, tanh, radians, ln

MathFunctionsValidateSuite and ScalarFunctionsValidateSuite were left as
abstract classes after PR apache#11756 (RAS removal) deleted their only concrete
subclasses, causing all tests in those suites to silently not run. Promote
both to concrete classes and disable ANSI mode (enabled by default in
Spark 4) so checkGlutenPlan[ProjectExecTransformer] works correctly.

Add Scala integration tests for sin, tan, tanh, radians, and ln. The ln
test includes an edge-case check that ln(0) and ln(-1) return null, not
-Infinity/NaN, matching Spark's Hive-inherited semantics.

The native registrations for these five functions are provided by
facebookincubator/velox#17648 (sparksql::registerMathFunctions). Set
UPSTREAM_VELOX_PR_ID=17648 to validate the full pipeline on CI while that
Velox PR is in flight. This will be cleared before merge once the Velox
change lands in Gluten.
@brijrajk brijrajk force-pushed the feat/register-sparksql-math-functions branch from 831b2f5 to a55f75f Compare May 30, 2026 05:09
@brijrajk
Copy link
Copy Markdown
Author

brijrajk commented May 30, 2026

@brijrajk, this PR totally makes sense. Generally, the community recommends marking a PR as draft if it's not ready for review and merge. Thank you so much for the update!

I just triggered the CI (manual trigger is required for a new contributor's first PR). It looks like there are some Scala code formatting issues. You can run ./dev/format-scala-code.sh locally to apply the fix.

Thank you for the kind words @philo-he and for triggering CI! Fixed the Scala formatting by running ./dev/format-scala-code.sh and pushed.

@brijrajk brijrajk changed the title [GLUTEN-12157][VL] Register sin, tan, tanh, radians in Velox sparksql function registry [GLUTEN-12157][VL] Fix silently-skipped math/scalar test suites; add Velox native tests for sin, tan, tanh, radians, ln May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Register missing math scalar functions in Velox sparksql registry (sin, tan, tanh, radians, ln)

3 participants