Skip to content

Improved ARM64 support #14569

Open
Open
@joeybenamy

Description

@joeybenamy

We are interested in running Spark NLP healthcare models on ARM64-based Graviton instances in Amazon EMR. With the following dependencies, we are able to build our docker image and start an Amazon EMR cluster and submit the Spark job, but it fails with runtime errors. We understand that ARM64 support is experimental, so using this ticket to track progress towards better ARM64 support.

Relevant application versions:

  • Amazon EMR emr-7.5.0
  • Hadoop 3.4.0
  • Hive 3.1.3
  • JupyterEnterpriseGateway 2.6.0
  • Livy 0.8.0
  • Spark 3.5.2
  • TensorFlow 2.16.1

pyproject.toml:

[project]
name = "jsl"
version = "0.1.0"
readme = "README.md"
dependencies = [
    "boto3",
    "cryptography",
    "johnsnowlabs==5.5.5", # 5.5.2 release was removed/unpublished
    "spark-nlp-jsl==5.5.3",
    "spark-nlp-display",
    "setuptools>=76.1.0",
]
# Python 3.9 (set in .python-version, pyproject.toml) is the common, minimum version in both EMR on EC2 and JohnSnowLabs
# https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-app-versions-7.x.html
# https://nlp.johnsnowlabs.com/licensed/api/python/getting_started/index.html#requirements
requires-python = ">= 3.9, < 3.10"

[[tool.uv.index]]
name = "johnsnowlabs"
url = "https://pypi.johnsnowlabs.com/5.5.3-SECRET"

[dependency-groups]
dev = [
    "venv-pack>=0.2.0",
]

Some logs below:
stderr.txt
controller.txt
stdout.txt

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions