When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set
Context
When submitting a Spark application to run in a Hadoop YARN cluster, it may fail with the following error:
Exception in thread "main" org.apache.spark.SparkException: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:630) at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:270) at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:233) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:119) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This usually occurs when Spark or Hadoop is not configured properly.
Fix the issue
The error message also points out the approach to fix the issue too. We just need to configure those two environment variables. The following section shows two common approaches.
System level environment variables
For Windows environment, add machine or user level environment variables; for Linux environment, use EXPORT command to add environment variables to the .bashrc.
- Variable name: HADOOP_CONF_DIR
- Variable value: %HADOOP_HOME%\etc\hadoop (Windows) or $HADOOP_HOME/etc/hadoop (Linux).
Spark environment script file
Alternatively, we can also add the variable to Sparke environment setup script file.
For Windows environment, open file load-spark-env.cmd in Spark bin folder and add the following line:
set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop
For Linux environment, open file load-spark-env.sh in Spark bin folder and add the following line:
export HADOOP_CONF_DIR=$HADOOP_HOME%/etc/hadoop