-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Adding Spark 3.5 with Scala 2.13 module #47492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding Spark 3.5 with Scala 2.13 module #47492
Conversation
…to users/fabianm/spark-scala2.13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Scala 2.13 support for the Azure Cosmos DB Spark 3.5 connector by creating a new module azure-cosmos-spark_3-5_2-13. The changes include refactoring parent POMs to support parameterized Scala versions, adding CI/CD pipeline configurations for the new module, and creating documentation and configuration files for the Scala 2.13 variant.
Key Changes:
- New module
azure-cosmos-spark_3-5_2-13with Scala 2.13 support for Spark 3.5 - Parameterization of Scala version properties in parent POMs to support multiple Scala versions
- CI/CD pipeline updates to include the new module in build and test workflows
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/spark.yml | Added new Databricks test configuration for Scala 2.13 variant |
| sdk/cosmos/spark.databricks.yml | Parameterized JAR_NAME to support multiple Spark/Scala versions |
| sdk/cosmos/pom.xml | Added azure-cosmos-spark_3-5_2-13 module to build |
| sdk/cosmos/ci.yml | Added new module to CI triggers, parameters, and artifacts list |
| sdk/cosmos/azure-cosmos-spark_3/pom.xml | Refactored to use parameterized Scala version properties |
| sdk/cosmos/azure-cosmos-spark_3-5_2-13/* | New module with POM, README, CONTRIBUTING, CHANGELOG, and config files |
| sdk/cosmos/azure-cosmos-spark_3-5_2-12/pom.xml | Added Scala version properties for consistency |
| sdk/cosmos/azure-cosmos-spark_3-5/pom.xml | Parameterized Scala dependencies |
| eng/versioning/version_client.txt | Added version entry for new module |
| eng/versioning/external_dependencies.txt | Added Scala 2.13 dependency entries |
| eng/.docsettings.yml | Added documentation settings for new module |
| .vscode/cspell.json | Added spelling exceptions for new module |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…to users/fabianm/spark-scala2.13
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
This PR adds a module producing the Cosmos DB Spark Connector for Spark 3.5 with scala 2.13.
Spark 4 will only be available with Scala 2.13 - Databricks has chosen to provide a Spark 3.5 runtime (16.4) with identical configuration except for the Scala binary version (it will be available in two flavors sclaa 2.12 and scala 2.13). Scala 2.12 and 2.13 use different binary versions - unlike Java in Scala binary versions are not downwards compatible - so a scala 2.12 binary can't be used with scala 2.13 - you have to have exactly matching binaries at runtime.
This means, to get support for scala 2.13, customers have to recompile their Spark applications - and we have to provide two different Maven artifacts for Spark 3.5 against Scala 2.12 and scala 2.13
This PR
Scala 2.13 compatibility changes
There were a few minor changes needed to ensure the source code shared between Spark 3.5 / scala 2.12 and scala 2.13 modules works against both scala versions
Engineering system changes
I had to add a few more external dependencies (for scala 2.13 - binaries) but more importantly had to make a change how versioning is validated. Given that we have shared source code used to produce a scala 2.12 and scals 2.13 bianry - but due to the bianry incompatibility each of these modules will use other external references I ad to use Maven build properties for the versions and artifact names for some dependencies. I am constructing these in Maven Properties with the correct versioning tags - but the PowerShell screipt used to control this has parsing logic that could not resolve the indorection. So, I made a change to skip the version checks when the version value is a Maven property.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines