Spark Scala: Load Data from SQL Server
In article Connect to SQL Server in Spark (PySpark), I showed how to connect to SQL Server in PySpark. Data can be loaded via JDBC, ODBC and Python drivers. In this article, I will directly use JDBC driver to load data from SQL Server with Scala.
Download JDBC driver
Download Microsoft JDBC Driver for SQL Server from the following website:
Copy the driver into the folder where you are going to run the Scala scripts. For this demo, the driver path is ‘sqljdbc_7.2/enu/mssql-jdbc-7.2.1.jre8.jar’.
Connect to SQL Server
Follow these steps to setup Spark session and then read the data via JDBC.
1) Added JDBC jars into Java classpath
The sample code runs in Spark Shell. JAR file is specified in spark-shell command. In a Scala script file, JAR files can be added directly when creating Spark session.
spark-shell --jars F:\lib\sqljdbc_7.2\enu -c spark.ui.port=11111
Location 'F:\lib\sqljdbc_7.2\enu' is where SQL Server JDBC driver is stored in my PC.
2) Setup database connection variables
val database = "test" val table = "dbo.Employees" val user = "zeppelin" val password = "zeppelin"
3) Load data via JDBC
val jdbcDF = (spark.read.format("jdbc") .option("url", "jdbc:sqlserver://localhost:1433;databaseName="+database) .option("dbtable", "Employees") .option("user", user) .option("password", password) .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") .load())
The above code snippet reads data from table Employees in database test in a local SQL Server instance. The authentication method is SQL Server authentication.
4) Show the DataFrame
scala> jdbcDF.show() +----------+------------+--------+--------------------+ |EmployeeID|EmployeeName|Position| Salary| +----------+------------+--------+--------------------+ | 1| Marcus| CTO|[00 C2 DC 57 1B 7...| | 2| Christopher| CIO|[00 C2 DC 57 1B 7...| | 3| Isabelle| CEO|[00 C2 DC 57 1B 7...| +----------+------------+--------+--------------------+