Skip to content

Comments

feat: add support parse_url#3563

Open
rafafrdz wants to merge 5 commits intoapache:mainfrom
rafafrdz:feat/add-support-parse_url
Open

feat: add support parse_url#3563
rafafrdz wants to merge 5 commits intoapache:mainfrom
rafafrdz:feat/add-support-parse_url

Conversation

@rafafrdz
Copy link
Contributor

@rafafrdz rafafrdz commented Feb 21, 2026

Summary

  • Added support for Spark parse_url by mapping ParseUrl to the native scalar function parse_url in expression serde.
  • Updated spark_expressions_support.md to mark parse_url as supported.
  • Add tests

Why
This closes one of the missing DataFusion 50 migration functions from issue #2443

Part of #2443

@rafafrdz rafafrdz marked this pull request as draft February 21, 2026 10:22
@rafafrdz rafafrdz marked this pull request as ready for review February 21, 2026 11:05
@mbutrovich
Copy link
Contributor

Thanks @rafafrdz! Kicking off CI.

@andygrove
Copy link
Member

This looks good so far @rafafrdz.

CI shows all three parse_url tests failing on Spark 4.0 with COMET: invoke is not supported. This happens because ParseUrl was changed to RuntimeReplaceable in Spark 4.0. It gets replaced by an Invoke expression during analysis, so Comet's serde never sees the ParseUrl node.

On the compatibility side, I noticed that the PR doesn't distinguish between ANSI and legacy mode for invalid URL handling. DataFusion's parse_url throws errors on malformed URLs, which matches Spark's behavior when failOnError=true (ANSI mode), but in legacy mode (failOnError=false), Spark returns null for invalid URLs instead of throwing.

@rafafrdz
Copy link
Contributor Author

Thanks for the suggestion @andygrove

What I did:

  • Spark 4.0 RuntimeReplaceable / Invoke path: ParseUrl is now handled when Spark rewrites it to Invoke(ParseUrlEvaluator.evaluate, ...), so Comet serde no longer misses it and we avoid "the invoke is not supported" fallback (please check it out if it's correct or not)
  • ANSI vs legacy behavior: serde now respects failOnError:
    • failOnError = true (ANSI) -> parse_url (error on malformed URLs)
    • failOnError = false (legacy) -> try_parse_url (returns NULL on malformed URLs)

I also registered try_parse_url on the native side and added a test for invalid URL handling in legacy mode (Spark 4.0 path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants