Run Apache Spark workloads seamlessly on Armada, a multi-cluster Kubernetes batch scheduler
armada-spark is an open-source integration designed to streamline deployment and management of Apache Spark workloads on Armada. It provides preconfigured Docker images, tooling for efficient image management, and example workflows to simplify local and production deployments.
- Java 8/11/17
- Scala 2.12/2.13
- Apache Maven 3.9.6+
- (Optional) kind for local clusters
- An accessible Armada Server and Lookout endpoint (check Armada Operator for the Quickstart guide)
By default, the project targets Spark 3.5.3 and Scala 2.13.15. To change versions:
./scripts/set-version.sh <spark-version> <scala-version>Example:
./scripts/set-version.sh 3.5.3 2.13.15After setting your desired Spark and Scala versions, build the Armada Spark project with Maven by running the following command:
mvn clean packageOnce your project is built, create the Docker image using:
./scripts/createImage.sh [-i image-name] [-m armada-master-url] [-q armada-queue] [-l armada-lookout-url]Options:
| Flag | Description | Example |
|---|---|---|
-i |
Docker image name | spark:armada |
-m |
Armada master URL | armada://localhost:30002 |
-q |
Armada queue | default |
-l |
Armada Lookout URL | http://localhost:30000 |
-p |
Include python | |
-h |
Display help |
To simplify, you may store these values in scripts/config.sh:
export IMAGE_NAME="spark:armada"
export ARMADA_MASTER="armada://localhost:30002"
export ARMADA_QUEUE="default"
export ARMADA_LOOKOUT_URL="http://localhost:30000"
export INCLUDE_PYTHON=true
export USE_KIND=trueNote: For client mode, you need to set additional configuration:
export ARMADA_MASTER="local://armada://localhost:30002" # Add "local://" prefix
export SPARK_DRIVER_HOST="172.18.0.1" # Required for client mode
export SPARK_DRIVER_PORT="7078" # Required for client modeWe recommend using kind for local testing.
If you are using the Armada Operator Quickstart, it is already based on kind.
Run the following command to load the Armada Spark image into your local kind cluster:
kind load docker-image $IMAGE_NAME --name armada
Before submitting a pull request, please ensure that your code adheres to the project's coding standards and passes all tests.
To run the unit tests, use the following command:
mvn testTo run the E2E tests, run Armada using the Operator Quickstart guide, then execute:
scripts/test-e2e.shTo check the code for linting issues, use the following command:
mvn spotless:checkTo automatically apply linting fixes, use:
mvn spotless:applyMake sure that the SparkPi job successfully runs on your Armada cluster before submitting a pull request.
The project includes a ready-to-use Spark job to test your setup:
# Cluster mode + Dynamic allocation
./scripts/submitArmadaSpark.sh -M cluster -A dynamic 100
# Cluster mode + Static allocation
./scripts/submitArmadaSpark.sh -M cluster -A static 100
# Client mode + Dynamic allocation
./scripts/submitArmadaSpark.sh -M client -A dynamic 100
# Client mode + Static allocation
./scripts/submitArmadaSpark.sh -M client -A static 100This job leverages the same configuration parameters (ARMADA_MASTER, ARMADA_QUEUE, ARMADA_LOOKOUT_URL) as the scripts/config.sh script.
Use the -h option to see what other options are available.