Open
Description
We are interested in running Spark NLP healthcare models on ARM64-based Graviton instances in Amazon EMR. With the following dependencies, we are able to build our docker image and start an Amazon EMR cluster and submit the Spark job, but it fails with runtime errors. We understand that ARM64 support is experimental, so using this ticket to track progress towards better ARM64 support.
Relevant application versions:
- Amazon EMR emr-7.5.0
- Hadoop 3.4.0
- Hive 3.1.3
- JupyterEnterpriseGateway 2.6.0
- Livy 0.8.0
- Spark 3.5.2
- TensorFlow 2.16.1
pyproject.toml
:
[project]
name = "jsl"
version = "0.1.0"
readme = "README.md"
dependencies = [
"boto3",
"cryptography",
"johnsnowlabs==5.5.5", # 5.5.2 release was removed/unpublished
"spark-nlp-jsl==5.5.3",
"spark-nlp-display",
"setuptools>=76.1.0",
]
# Python 3.9 (set in .python-version, pyproject.toml) is the common, minimum version in both EMR on EC2 and JohnSnowLabs
# https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-app-versions-7.x.html
# https://nlp.johnsnowlabs.com/licensed/api/python/getting_started/index.html#requirements
requires-python = ">= 3.9, < 3.10"
[[tool.uv.index]]
name = "johnsnowlabs"
url = "https://pypi.johnsnowlabs.com/5.5.3-SECRET"
[dependency-groups]
dev = [
"venv-pack>=0.2.0",
]
Some logs below:
stderr.txt
controller.txt
stdout.txt