This diagram shows an overview of Spark memory management when running in YARN. It helps you to understand how your Spark memory is allocated and how they are used.
In Spark executor, there are two types of memory used:
- Execution memory - refers to that used for computation in shuffles, joins, sorts and aggregations;
- Storage memory - refers to that used for caching and propagating internal data across the cluster.
When no storage memory is used, execution can use all the available memory and vice versa.
These two types of memory usage are decided by two configuration items:
- spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MiB) (default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors.
- spark.memory.storageFraction expresses the size of R as a fraction of M (default 0.5).