Diagnostics: Container is running beyond physical memory limits

Scenario

Recently I created an Oozie workflow which contains one Spark action. The Spark action master is yarn and deploy mode is cluster.

Each time when the job runs about 30 minutes, the application fails with errors like the following:

Application application_** failed 2 times due to AM Container for appattempt_** existed with exitCode: -104.

Diagnostics: Container is running beyond physical memory limits is running beyond physical memory limits. Current usage: **GB of **GB physical memory used; ** GB of ** GB virtual memory used. Killing container.

The error message is self-explaining - one of the container is running out memory. However it took me a while to find out which container was killed and what’s the reason behind it due to my limited knowledge in Hadoop and YARN. I also found many other people also encountered similar problems in their big data environments (CDH or Hortonworks or other distributions). To save you some time, I will document the approaches I used to resolve the problem. Your problem may be different, for example, it probably occurred in a Sqoop action or other actions or in a MapReduce/Spark/Java application without Oozie. The resolution will vary but the ways to debug should be similar.

Find out the culprit container

In my scenario, two applications are created:

Oozie launcher MapReduce application
Spark application for the action

The first application is just one MapReduce application (with only Map task) which submits the Spark job. The second one is the main Spark application. So the error can happen to any of the following containers:

Oozie launcher MapReduce application master container
The application container that executes the Map task
Spark driver container / application master container
Any of the Spark executor container

This is the first time I actually need to debug these problems as usually the configured Spark/Yarn/Oozie arguments/JVM arguments are good enough and I seldom encountered these issues. Due to limited knowledge in debugging memory related issues in YARN/Spark, I had to try one by one.

Oozie launcher containers

It’s possible that the Oozie launcher application master container (AM) or the map task executor container may run out of memory. However it is almost unlikely as the default values for memory related arguments should be enough for this since the main data tasks are done in the Spark application not the launcher application. Nevertheless, I still looked into it.

There are two primary properties I looked into:

Properties	Description
oozie.launcher.mapreduce.map.memory.mb	memory amount in MB of the MapReduce Application Master
oozie.launcher.mapreduce.map.java.opts	MapReduce Application Master JVM options
oozie.launcher.yarn.app.mapreduce.am.resource.mb	memory amount in MB of the MapReduce Application Master
oozie.launcher.mapreduce.map.java.opts	MapReduce Application Master JVM options

There is one good article about these arguments if you want to learn more about them:

Memory allocation for Oozie Launcher job

So I increased the memory for the above properties but they didn’t resolve my problem.

You probably will ask why there is no reduce properties. It’s because the launcher application created only has Map task as I mentioned earlier.

Later on, I suddenly realized it has nothing to do with the launcher application because the failed application is the Spark application not the launcher one.

Spark executor containers

Then I was thinking maybe one of the Spark executor container ran out of memory. So I looked into the following properties/arguments.

Properties/Arguments	Description
yarn.scheduler.minimum-allocation-mb	It defines the maximum memory allocation available for a container in MB. For example, if yarn.nodemanager.resource.memory-mbis configured as 150GB. The executor will be initiated with a minimum amount of this configured value from the available 150GB memory in the node.
--executor-memory	spark-submit command argument which specifies the memory for each Spark executor container.

The following diagram is one good explanation of the memory related properties in Spark application. In one node, there can be many executor containers.

(* Image from Internet)

I changed these parameters but they didn’t help either. When I also looked into Spark application master web UI, I found the memory I allocated to each executor container is far more than sufficient. However the driver container has much smaller memory allocated.

Spark driver container

My Spark application has thousands of tasks that need to be executed and I started to think it’s probably the driver container ran out of memory. I am running the Spark in YARN cluster and the driver container is also instantiated randomly in one of the nodes.

Thus I looked into the following argument:

Properties/Arguments	Description
--driver-memory	spark-submit command argument which specifies the memory for Spark driver application container. For more details, refer to this page: https://spark.apache.org/docs/latest/configuration.html
spark.driver.maxResultSize	Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors. *This configuration may also cause problems sometimes if you are collecting data into drivers. In some cases, joins will also collect data to driver to conduct a broadcast join.

Properties/Arguments

Description

--driver-memory

spark-submit command argument which specifies the memory for Spark driver application container.

For more details, refer to this page: https://spark.apache.org/docs/latest/configuration.html

spark.driver.maxResultSize

Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

*This configuration may also cause problems sometimes if you are collecting data into drivers. In some cases, joins will also collect data to driver to conduct a broadcast join.

I increased this argument in Spark action configuration (Options list in Hue Oozie Editor UI) in Oozie workflow and the problem is resolved finally! Please note if your deploy mode is client then driver memory argument doesn’t apply.

For more details about running Spark jobs in YARN, refer to the following official page:

Running Spark on YARN

Container naming standard

It took me a while to figure out these things as there are so many configurations or properties you can specify when running Spark jobs in yarn-cluster mode.

There is a a quick way to figure out which container errored out.

The naming standard for container in YARN is: **Container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}.**If epoch is 0, it will be omitted.

For example, container name Container_e39_241090117234543_0241_02_000001 means the second application attempts for application ID 241090117234543_0241. The containerId in application master is 1. If you look into Spark application master UI executors page, you will find out the ID for the driver container is usually 1 (at least in my environment). If I would know this, I would directly look into the driver memory configurations or properties, which would have saved me much time.

Alternatively, you can also find out container list through yarn CLI commands.

Run yarn applicationattempt -list <Application Id> to get the attempt ID
Run yarn container -list <Application Attempt Id> to get the container IDs

The commands will print out the tracking URLs too which you can view the logs. The above two commands will only generate output when the application is running. Once it is completed, you can look into the history logs to find out the executing logs.

Summary

It’s the first time for me to debug these problems. Even now, I am actually not 100% sure whether my understanding is all correct. So if you are experts in these areas, i.e. Hadoop/YARN/Spark cluster administrators, please help to correct if anything I mentioned is wrong. For example, I am still not quite sure whether I understand the problem thoroughly and I have a question in my mind:

The driver application container is created dynamically each time when I run the Oozie workflow; however each time it failed with the same errors. Instead of specifying driver memory, is there other way to configure YARN related memory properties to achieve the same effect?

If you are also one user, please use the above approach to verify whether they are working for your problems. As mentioned earlier, depends the container type that runs out of memory, your resolution can be different from mine. For example, if your Spark executor container errored out, you may need to look into the executor memory related properties; if your MapReduce container errored out, you will then look into the MapReduce related YARN properties/configurations. Any feedbacks will be appreciated so that we can help each other and also learn from each other.