Error when connecting to oracle database in pyspark
This is my code when run in pyspark env(version spark 3.1.2):
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:oracle:thin:@10.0.1.1:1521/sbank") \
.option("dbtable", "sa.a") \
.option("user", "g") \
.option("password", "zxc") \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.load()
But shows the announcement below as:
Py4JJavaError Traceback (most recent call last)
/tmp/ipykernel_29/4076487584.py in <module>
----> 1 jdbcDF = spark.read \
2 .format("jdbc") \
3 .option("url", "jdbc:oracle:thin:@10.0.1.1:1521/sbank") \
4 .option("dbtable", "sa.a") \
5 .option("user", "g") \
/usr/local/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
208 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
209 else:
--> 210 return self._df(self._jreader.load())
211
212 def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o137.load
Can anyone help me to solve that? Thank you in advance.
I added ojdbc11.jar into jars forder of spark
copyright
This page is subject to Site terms.
comment Comments
Kontext
Kontext
access_time
3 years ago
link
more_vert
N
Nguyen luffy
Nguyen
access_time
3 years ago
link
more_vert
Thanks Kontext. I have tried to follow that web https://kontext.tech/article/1060/pyspark-read-data-from-oracle-database
That was successful.
Version of jdk is 1.8.0_352, open jdk 64-bit server VM.
All of logs I have shown above when I ran that statement code.
Kontext
Kontext
access_time
3 years ago
link
more_vert
I'm glad it works for you.
Hi Nguyen,
Welcome to Kontext!
For questions like this, you can publish in our Forums in future.
Have you followed this article? PySpark - Read Data from Oracle Database.
Can you please try ojdbc 8 instead of 11? ojdbc 11 requires JDK 11. Spark 3.1.2 can run on JDK 11 technically. What is your JDK version?
The error message is not detailed, can you paste the full error logs?