How to install to EMR from maven repository to /usr/lib/spark/jars #2355

hueiyuan · 2025-03-19T07:48:28Z

SynapseML version

1.0.10

System information

Language version (e.g. python 3.8, scala 2.12): python 3.9
Spark Version (e.g. 3.2.3): 3.5.1
Spark Platform (e.g. Synapse, Databricks): AWS EMR Release 7.3.1

Describe the problem

Now I would like to try to install SynapseML to EMR for pyspark. If we execute configuration based on the below command on Jupyter notebooks that is work.

%%configure -f
{
  "name": "synapseml",
  "conf": {
      "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.9-spark3.5",
      "spark.jars.repositories": "https://mmlspark.azureedge.net/maven"
  }
}

But in production, we don't use Jupyter notebooks. Therefore, we first download corresponding jars from maven repository and copy to the path /usr/lib/spark/jars on EMR and do not work and show com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM

Have anyone know what is the root cause result in this? Thank you.

Code to reproduce issue

from synapse.ml.isolationforest import IsolationForest

# print(type(IsolationForest))
hyper_params = {
    'n_estimators': 100,
    'max_samples': 32
    'max_features': 1,
    'bootstrap': False,
    'contamination': 0.1,    
}

isolation_forest_model = (
    IsolationForest()
    .setNumEstimators(hyper_params["n_estimators"])
    .setBootstrap(hyper_params["bootstrap"])
    .setMaxSamples(hyper_params["max_samples"])
    .setMaxFeatures(hyper_params["max_features"])
    .setFeaturesCol("features")
    .setPredictionCol("predictedLabel")
    .setScoreCol("outlierScore")
    .setContamination(hyper_params["contamination"])
    .setContaminationError(0.01 * hyper_params["contamination"])
)

Other info / logs

An error was encountered:
com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM
Traceback (most recent call last):
  File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/pyspark.zip/pyspark/__init__.py", line 139, in wrapper
    return func(self, **kwargs)
  File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/com.microsoft.azure_synapseml-core_2.12-1.0.9-spark3.5.jar/synapse/ml/isolationforest/IsolationForest.py", line 78, in __init__
    self._java_obj = self._new_java_obj("com.microsoft.azure.synapse.ml.isolationforest.IsolationForest", self.uid)
  File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 84, in _new_java_obj
    java_obj = getattr(java_obj, name)
  File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1664, in __getattr__
    raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
py4j.protocol.Py4JError: com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM

What component(s) does this bug affect?

What language(s) does this bug affect?

language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

hueiyuan added the bug label Mar 19, 2025

github-actions bot added the triage label Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to install to EMR from maven repository to /usr/lib/spark/jars #2355

How to install to EMR from maven repository to /usr/lib/spark/jars #2355

hueiyuan commented Mar 19, 2025

How to install to EMR from maven repository to /usr/lib/spark/jars #2355

How to install to EMR from maven repository to /usr/lib/spark/jars #2355

Comments

hueiyuan commented Mar 19, 2025

SynapseML version

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?