This repository contains the Dockerfile and associated configurations for deploying Apache Spark in a standalone mode using Docker.
-
Clone the repository
git clone [email protected]:kbase/cdm-spark-standalone.git cd cdm-spark-standalone
-
Build the Docker image
docker compose up -d --build
-
Access the Spark UI:
- Spark Master: http://localhost:8090
- Spark Worker 1: http://localhost:8081
- Spark Worker 2: http://localhost:8082
To test the cluster is working:
-
Start a shell in the spark-user container:
docker compose exec -it spark-user bash
-
Submit a test job:
spark_user@d30c26e91ae0:/opt/bitnami/spark$ bin/spark-submit --master $SPARK_MASTER_URL --deploy-mode client examples/src/main/python/pi.py 10
You should see a line like
Pi is roughly 3.138040
in the output.
-
Start a shell in the spark-user container:
docker compose exec -it spark-user bash
-
Run the example:
spark-submit /app/redis_container_script.py
-
Start a shell in the Redis container:
docker compose exec -it redis bash
-
Start the Redis CLI:
redis-cli
-
List all keys for your cached table:
keys people:*
-
View the contents of a specific key (replace the key with one from the previous command):
hgetall people:d6d606a747ae40368fc7fdae784b835b