Connect to MySQL in Spark (PySpark)
info Last modified by Administrator 3 years ago
thumb_up 1
comment Comments
S
Samuel
#1659 Re: Connect to MySQL in Spark (PySpark)
question for you: Is is more or less performant to use the spark only technique of:
spark_df = spark.read.format("jdbc").option("url", "jdbc:mysql://<host_string>/<database>").option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "(SELECT * FROM some_table) temp_spark_table").option("user", "theusername").option("password", "thepassword").load()
You need to run some tests to find out.
One is using JDBC and another using Python native driver. I dont think there be much major performance differences as the Spark read from JDBC will not run in parallel anyway. So the main performance differences will be the difference between these two drivers.
If you do want to extract data using JDBC in parallel by utilizing partition columns, you can consider Sqoop.
person Samuel access_time 2 years ago
Re: Connect to MySQL in Spark (PySpark)
question for you: Is is more or less performant to use the spark only technique of: