Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

access_time 20 days ago visibility20 comment 0

When installing a vanilla Spark on Windows or Linux, you may encounter the following error to invoke spark-sql command:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

This error usually occurs when installing a Spark version without built-in Hadoop libraries (headless version) as the Spark hive and hive thrift server packages are not included.   

There are two ways to fix this issue.

Solution 1 - Install Spark with Hadoop built-in

When downloading Spark, choose the version that has built-in Hadoop.

In the library folder, you should be able to find a JAR file named spark-hive-thriftserver_2.12-3.0.1.jar.

You can run spark-sql command successfully without errors.

warning Make sure Hadoop version in Spark binary package is consistent with your Hadoop version; otherwise you may encounter JAR file version issues. And also the dependent JAR package versions needs to be consistent with your Hive installation. When Spark is compiled with Hive enabled, there are three compile dependencies from Hive: hive-cli, hive-jdbc and hive-beeline. For Spark 3.0.1, it depends on Hive 3.1.2.

Solution 2 - Download the missing JAR file manually

warning This approach is not fully verified yet.

Another approach is to download the packages manually. For example, the missing JAR file is available on Maven Repository: Maven Repository: org.apache.spark » spark-hive-thriftserver_2.12 » 3.0.1 for Spark 3.0.1. If you are installing other versions of Spark, download the right package accordingly.

The following steps are for Spark 3.0.1.

  1. Download the package:
    wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive-thriftserver_2.12/3.0.1/spark-hive-thriftserver_2.12-3.0.1.jar
  2. Copy the package to $SPARK_HOME/jars folder.
    mv spark-hive-thriftserver_2.12-3.0.1.jar $SPARK_HOME/jars/
  3. Download another package spark-hive_2.11-2.4.3.jar as it is also required but missing in the headless version:
    wget https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.12/3.0.1/spark-hive_2.12-3.0.1.jar
  4. Copy the package to $SPARK_HOME/jars folder.
    mv spark-hive_2.12-3.0.1.jar $SPARK_HOME/jars/
Now your spark-sql command should work properly.

warning Make sure HiveServer2 service is running before starting spark-sql.  
info Last modified by Raymond 20 days ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 1409
thumb_up 0
access_time 2 years ago

This code snippet shows how to convert string to date.

visibility 1080
thumb_up 0
access_time 6 months ago

Spark is a robust framework with logging implemented in all modules. Sometimes it might get too verbose to show all the INFO logs. This article shows you how to hide those INFO logs in the console output. Log level can be setup using function pyspark.SparkContext.setLogLevel . The ...

visibility 7833
thumb_up 1
access_time 2 years ago

When creating Spark date frame using schemas, you may encounter errors about “field **: **Type can not accept object ** in type <class '*'>”. The actual error can vary, for instances, the following are some examples: field xxx: BooleanType can not accept object 100 in type <class ...