In article Connect to SQL Server in Spark (PySpark), I showed how to connect to SQL Server in PySpark. Data can be loaded via JDBC, ODBC and Python drivers. In this article, I will directly use JDBC driver to load data from SQL Server with Scala.
Download JDBC driver
Download Microsoft JDBC Driver for SQL Server from the following website:
Copy the driver into the folder where you are going to run the Scala scripts. For this demo, the driver path is ‘sqljdbc_7.2/enu/mssql-jdbc-7.2.1.jre8.jar’.
Connect to SQL Server
Follow these steps to setup Spark session and then read the data via JDBC.
- Added JDBC jars into Java classpath
The sample code runs in Spark Shell. JAR file is specified in spark-shell command. In a Scala script file, JAR files can be added directly when creating Spark session.
spark-shell --jars F:\lib\sqljdbc_7.2\enu -c spark.ui.port=11111
Location 'F:\lib\sqljdbc_7.2\enu' is where SQL Server JDBC driver is stored in my PC.
- Setup database connection variables
val database = "test"
val table = "dbo.Employees"
val user = "zeppelin"
val password = "zeppelin"
- Load data via JDBC
val jdbcDF = (spark.read.format("jdbc")
.option("url", "jdbc:sqlserver://localhost:1433;databaseName="+database)
.option("dbtable", "Employees")
.option("user", user)
.option("password", password)
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load())
The above code snippet reads data from table Employeesin database testin a local SQL Server instance. The authentication method is SQL Server authentication.
- Show the DataFrame
scala> jdbcDF.show()
+----------+------------+--------+--------------------+
|EmployeeID|EmployeeName|Position| Salary|
+----------+------------+--------+--------------------+
| 1| Marcus| CTO|[00 C2 DC 57 1B 7...|
| 2| Christopher| CIO|[00 C2 DC 57 1B 7...|
| 3| Isabelle| CEO|[00 C2 DC 57 1B 7...|
+----------+------------+--------+--------------------+