Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

event 2020-12-27 visibility 3,271 comment 0 insights
more_vert
insights Stats
Raymond Raymond Spark & PySpark

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.


When installing a vanilla Spark on Windows or Linux, you may encounter the following error to invoke spark-sql command:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

This error usually occurs when installing a Spark version without built-in Hadoop libraries (headless version) as the Spark hive and hive thrift server packages are not included.   

There are two ways to fix this issue.

Solution 1 - Install Spark with Hadoop built-in

When downloading Spark, choose the version that has built-in Hadoop.

20201227125239-image.png

In the library folder, you should be able to find a JAR file named spark-hive-thriftserver_2.12-3.0.1.jar.

You can run spark-sql command successfully without errors.

warning Make sure Hadoop version in Spark binary package is consistent with your Hadoop version; otherwise you may encounter JAR file version issues. And also the dependent JAR package versions needs to be consistent with your Hive installation. When Spark is compiled with Hive enabled, there are three compile dependencies from Hive: hive-cli, hive-jdbc and hive-beeline. For Spark 3.0.1, it depends on Hive 3.1.2.

Solution 2 - Download the missing JAR file manually

warning This approach is not fully verified yet.

Another approach is to download the packages manually. For example, the missing JAR file is available on Maven Repository: Maven Repository: org.apache.spark » spark-hive-thriftserver_2.12 » 3.0.1 for Spark 3.0.1. If you are installing other versions of Spark, download the right package accordingly.

The following steps are for Spark 3.0.1.

  1. Download the package:
    wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive-thriftserver_2.12/3.0.1/spark-hive-thriftserver_2.12-3.0.1.jar
  2. Copy the package to $SPARK_HOME/jars folder.
    mv spark-hive-thriftserver_2.12-3.0.1.jar $SPARK_HOME/jars/
  3. Download another package spark-hive_2.11-2.4.3.jar as it is also required but missing in the headless version:
    wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.12/3.0.1/spark-hive_2.12-3.0.1.jar
  4. Copy the package to $SPARK_HOME/jars folder.
    mv spark-hive_2.12-3.0.1.jar $SPARK_HOME/jars/
Now your spark-sql command should work properly.

warning Make sure HiveServer2 service is running before starting spark-sql.  
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts