Spark Scala: Load Data from SQL Server

access_time 29 days ago visibility15 comment 0

In article Connect to SQL Server in Spark (PySpark),  I showed how to connect to SQL Server in PySpark. Data can be loaded via JDBC, ODBC and Python drivers. In this article, I will directly use JDBC driver to load data from SQL Server with Scala.

Download JDBC driver

Download Microsoft JDBC Driver for SQL Server from the following website:

Download JDBC Driver

Copy the driver into the folder where you are going to run the Python scripts. For this demo, the driver path is ‘sqljdbc_7.2/enu/mssql-jdbc-7.2.1.jre8.jar’.

Connect to SQL Server

Follow these steps to setup Spark session and then read the data via JDBC.

1) Added JDBC jars into Java classpath

The sample code runs in Spark Shell. JAR file is specified in spark-shell command. In a Scala script file, JAR files can be added directly when creating Spark session.

spark-shell --jars F:\lib\sqljdbc_7.2\enu -c spark.ui.port=11111

Location 'F:\lib\sqljdbc_7.2\enu' is where SQL Server JDBC driver is stored in my PC.

2) Setup database connection variables

val database = "test"
val table = "dbo.Employees"
val user = "zeppelin"
val password  = "zeppelin"

3) Load data via JDBC

val jdbcDF = (spark.read.format("jdbc")
.option("url",  "jdbc:sqlserver://localhost:1433;databaseName="+database)
.option("dbtable", "Employees")
.option("user", user)
.option("password", password) 
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
.load())

The above code snippet reads data from table Employees in database test in a local SQL Server instance. The authentication method is SQL Server authentication.

4) Show the DataFrame

scala> jdbcDF.show()
+----------+------------+--------+--------------------+
|EmployeeID|EmployeeName|Position|              Salary|
+----------+------------+--------+--------------------+
|         1|      Marcus|     CTO|[00 C2 DC 57 1B 7...|
|         2| Christopher|     CIO|[00 C2 DC 57 1B 7...|
|         3|    Isabelle|     CEO|[00 C2 DC 57 1B 7...|
+----------+------------+--------+--------------------+
info Last modified by Raymond 29 days ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

Fix - ERROR SparkUI: Failed to bind SparkUI
visibility 40
thumb_up 0
access_time 2 months ago

When starting Spark shell in Windows 10 machine, I encountered an error - ERROR SparkUI: Failed to bind SparkUI. The detailed error message looks like the following: 20/12/13 20:47:34 ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Failed to bind to /0.0.0.0:4056: Service 'SparkUI' ...

visibility 8651
thumb_up 0
access_time 2 years ago

This pages summarizes the steps to install the latest version 2.4.3 of Apache Spark on Windows 10 via Windows Subsystem for Linux (WSL). Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10. Install Windows Subsystem for Linux on a Non-System ...

.NET for Apache Spark v1.0.0 Released
visibility 26
thumb_up 0
access_time 4 months ago

.NET for Apache Spark v1.0.0 was released officially on 2020-10-14. This page summarizes some important resources for you to get started on .NET for Spark. *Image credit: https://github.com/dotnet/spark/raw/master/docs/img/dotnetsparklogo-6.png Release Notes on GitHub ...