Spark Scala: Load Data from SQL Server

In article Connect to SQL Server in Spark (PySpark), I showed how to connect to SQL Server in PySpark. Data can be loaded via JDBC, ODBC and Python drivers. In this article, I will directly use JDBC driver to load data from SQL Server with Scala.

Download JDBC driver

Download Microsoft JDBC Driver for SQL Server from the following website:

Download JDBC Driver

Copy the driver into the folder where you are going to run the Scala scripts. For this demo, the driver path is ‘sqljdbc_7.2/enu/mssql-jdbc-7.2.1.jre8.jar’.

Connect to SQL Server

Follow these steps to setup Spark session and then read the data via JDBC.

Added JDBC jars into Java classpath

The sample code runs in Spark Shell. JAR file is specified in spark-shell command. In a Scala script file, JAR files can be added directly when creating Spark session.

spark-shell --jars F:\lib\sqljdbc_7.2\enu -c spark.ui.port=11111

Location 'F:\lib\sqljdbc_7.2\enu' is where SQL Server JDBC driver is stored in my PC.

Setup database connection variables

val database = "test"
val table = "dbo.Employees"
val user = "zeppelin"
val password  = "zeppelin"

Load data via JDBC

val jdbcDF = (spark.read.format("jdbc")
.option("url",  "jdbc:sqlserver://localhost:1433;databaseName="+database)
.option("dbtable", "Employees")
.option("user", user)
.option("password", password) 
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
.load())

The above code snippet reads data from table Employeesin database testin a local SQL Server instance. The authentication method is SQL Server authentication.

Show the DataFrame

scala> jdbcDF.show()
+----------+------------+--------+--------------------+
|EmployeeID|EmployeeName|Position|              Salary|
+----------+------------+--------+--------------------+
|         1|      Marcus|     CTO|[00 C2 DC 57 1B 7...|
|         2| Christopher|     CIO|[00 C2 DC 57 1B 7...|
|         3|    Isabelle|     CEO|[00 C2 DC 57 1B 7...|
+----------+------------+--------+--------------------+