Description
Is there an existing issue for this?
- I have searched the existing issues and did not find a match.
Who can help?
No response
What are you working on?
Dear sir,
I referenced your code to import pipeline, but encountered an error,
Current Behavior
I run the following script in Azure Databricks workspace , An error occurred
"""
import sparknlp
import os
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
os.environ['JAVA_HOME'] = 'C:\Program Files\Eclipse Adoptium\jdk-11.0.27.6-hotspot'
os.environ['HADOOP_HOME'] = 'C:\hadoop'
os.environ['SPARK_LOCAL_DIRS'] = 'C:\spark_temp'
spark = sparknlp.start(gpu=False,
apple_silicon=False,
aarch64=False,
memory="16G",
cache_folder="",
log_folder="",
cluster_tmp_dir="",
params={"spark.jars.repositories": "http://s3.amazonaws.com/auxdata.johnsnowlabs.com"},
real_time_output=False,
output_level=1)
print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)
from pyspark.ml import Pipeline
from sparknlp.annotator import UniversalSentenceEncoder
from sparknlp.common import *
from sparknlp.base import *
newstestdataset = spark.read
.option("header", True)
.option("inferSchema", True)
.csv("test_data.csv")
newstestdataset.show(10)
document = DocumentAssembler()
.setInputCol("description")
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained()
.setInputCols(["document"])
.setOutputCol("sentence_embeddings")
pipeline = Pipeline(stages=[document, use])
testdataset = pipeline.fit(newstestdataset).transform(newstestdataset)
testdataset.show()
spark.stop()
“”“
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.SdkClientException: Unable to execute HTTP request: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Expected Behavior
The scripts are expected to run successfully.
Steps To Reproduce
The error script has been attached
"""
import sparknlp
import os
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
os.environ['JAVA_HOME'] = 'C:\Program Files\Eclipse Adoptium\jdk-11.0.27.6-hotspot'
os.environ['HADOOP_HOME'] = 'C:\hadoop'
os.environ['SPARK_LOCAL_DIRS'] = 'C:\spark_temp'
spark = sparknlp.start(gpu=False,
apple_silicon=False,
aarch64=False,
memory="16G",
cache_folder="",
log_folder="",
cluster_tmp_dir="",
params={"spark.jars.repositories": "http://s3.amazonaws.com/auxdata.johnsnowlabs.com"},
real_time_output=False,
output_level=1)
print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)
from pyspark.ml import Pipeline
from sparknlp.annotator import UniversalSentenceEncoder
from sparknlp.common import *
from sparknlp.base import *
newstestdataset = spark.read
.option("header", True)
.option("inferSchema", True)
.csv("test_data.csv")
newstestdataset.show(10)
document = DocumentAssembler()
.setInputCol("description")
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained()
.setInputCols(["document"])
.setOutputCol("sentence_embeddings")
pipeline = Pipeline(stages=[document, use])
testdataset = pipeline.fit(newstestdataset).transform(newstestdataset)
testdataset.show()
spark.stop()
“”“
Spark NLP version and Apache Spark
Spark NLP version and Apache Spark
Spark NLP version: 5.5.3
spark.version; 3.2.1
Type of Spark Application
No response
Java Version
11.0.27" 2025-04-15
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response