visibility 12,533 comment 2 access_time 3 years ago languageEnglish
more_vert

Connect to MySQL in Spark (PySpark)

Spark is an analytics engine for big data processing. There are various ways to connect to a MySQL database in Spark.  This page summarizes some of common approaches to connect to MySQL using Python as programming language. Similar as  Connect to SQL Server in Spark (PySpark) , there ...
info Last modified by Administrator 3 years ago
thumb_up 1
comment Comments
access_time 2 years ago link more_vert
#1660 Re: Connect to MySQL in Spark (PySpark)

You need to run some tests to find out.

One is using JDBC and another using Python native driver. I dont think there be much major performance differences as the Spark read from JDBC will not run in parallel anyway. So the main performance differences will be the difference between these two drivers.

If you do want to extract data using JDBC in parallel by utilizing partition columns, you can consider Sqoop.

format_quote

person Samuel access_time 2 years ago
Re: Connect to MySQL in Spark (PySpark)

question for you: Is is more or less performant to use the spark only technique of:

spark_df = spark.read.format("jdbc").option("url", "jdbc:mysql://<host_string>/<database>").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "(SELECT * FROM some_table) temp_spark_table").option("user", "theusername").option("password", "thepassword").load()


access_time 2 years ago link more_vert
#1659 Re: Connect to MySQL in Spark (PySpark)

question for you: Is is more or less performant to use the spark only technique of:

spark_df = spark.read.format("jdbc").option("url", "jdbc:mysql://<host_string>/<database>").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "(SELECT * FROM some_table) temp_spark_table").option("user", "theusername").option("password", "thepassword").load()


Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

recommendMore from Kontext