When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set

visibility 1,473 comment 0 access_time 8m languageEnglish

Context

When submitting a Spark application to run in a Hadoop YARN cluster, it may fail with the following error:

Exception in thread "main" org.apache.spark.SparkException: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
        at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:630)
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:270)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:233)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:119)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This usually occurs when Spark or Hadoop is not configured properly.

Fix the issue

The error message also points out the approach to fix the issue too. We just need to configure those two environment variables. The following section shows two common approaches. 

System level environment variables

For Windows environment, add machine or user level environment variables; for Linux environment, use EXPORT command to add environment variables to the .bashrc

  • Variable name: HADOOP_CONF_DIR
  • Variable value: %HADOOP_HOME%\etc\hadoop (Windows) or $HADOOP_HOME/etc/hadoop (Linux).

Spark environment script file

Alternatively, we can also add the variable to Sparke environment setup script file.

For Windows environment, open file load-spark-env.cmd in Spark bin folder and add the following line:

set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop

For Linux environment, open file load-spark-env.sh in Spark bin folder and add the following line:

export HADOOP_CONF_DIR=$HADOOP_HOME%/etc/hadoop
copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Tags