Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
When installing a vanilla Spark on Windows or Linux, you may encounter the following error to invoke spark-sql command:
Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
This error usually occurs when installing a Spark version without built-in Hadoop libraries (headless version) as the Spark hive and hive thrift server packages are not included.
There are two ways to fix this issue.
Solution 1 - Install Spark with Hadoop built-in
When downloading Spark, choose the version that has built-in Hadoop.
In the library folder, you should be able to find a JAR file named spark-hive-thriftserver_2.12-3.0.1.jar.
You can run spark-sql command successfully without errors.
Solution 2 - Download the missing JAR file manually
Another approach is to download the packages manually. For example, the missing JAR file is available on Maven Repository: Maven Repository: org.apache.spark » spark-hive-thriftserver_2.12 » 3.0.1 for Spark 3.0.1. If you are installing other versions of Spark, download the right package accordingly.
The following steps are for Spark 3.0.1.
- Download the package:
wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive-thriftserver_2.12/3.0.1/spark-hive-thriftserver_2.12-3.0.1.jar
- Copy the package to $SPARK_HOME/jars folder.
mv spark-hive-thriftserver_2.12-3.0.1.jar $SPARK_HOME/jars/
- Download another package spark-hive_2.11-2.4.3.jar as it is also required but missing in the headless version:
wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.12/3.0.1/spark-hive_2.12-3.0.1.jar
- Copy the package to $SPARK_HOME/jars folder.
mv spark-hive_2.12-3.0.1.jar $SPARK_HOME/jars/