By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly.

If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully.

For example, in the following post, MapReduce application failed because the submitted jobs needs more memory application compared with the maximum available.

Container is running beyond memory limits

Thus, it is critical to configure these figures correctly.

YARN related configurations

Documentation about YARN configurations in Hadoop 3.1.0

Refer to the following page about YARN available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Property Name Description
yarn.scheduler.minimum-allocation-mb

The minimum allocation of memory (MBs) for every container request at the Resource Manager.

If a  node manager that is configured to have less memory than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-mb

The maximum allocation of memory (MBs) for every container request at the Resource Manager.

Memory requests higher than this will throw an InvalidResourceRequestException.

yarn.scheduler.minimum-allocation-vcores

The minimum allocation of virtual CPU cores for every container request at the Resource Manager.

If a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-vcores

The maximum allocation of virtual CPU cores for every container request at the Resource Manager.

Requests higher than this will throw an InvalidResourceRequestException.

yarn.nodemanager.resource.memory-mb

Amount of physical memory (MBs), that can be allocated for containers.

If the value is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated in Windows and Linux.

The default value is 8GB.

yarn.nodemanager.vmem-pmem-ratio

Ratio between virtual memory to physical memory, which will be used to set memory limits for containers.

Container allocations are expressed in terms of physical memory; virtual memory usage is allowed to exceed this allocation by this ratio.

The default value is 2.1.

To summarize, all requests with values less than the minimum values will be set to the minimum values of the properties; others exceptions will be thrown out if the requested values are more than the maximum configured values.

Sample configuration in yarn-site.xml

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4086</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>2048</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

In the above configuration, the Resource Manager will allocate 526MB~4086MB memory for container requests.  In each node server, the physical memory is configured as 2048MB.

If a MapRequest application requests for 5000MB memory, InvalidResourceRequestException will be thrown; if it requests for 20500MB total memory for 20 map tasks while there are only 10 node managers, memory won’t be sufficient and errors will be thrown out.

Sample errors

If the site is configured with the following values, the application will fail if request memory for each container is greater than 1024MB.

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4096</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>1024</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

For example, in the following error log, the application was requesting 1536MB memory from scheduler while the maximum allowed allocation is 1024MB in each node; in the meanwhile, the total memory is 1024*2(two nodes) = 2048, which is also less than the configured maximum 4086.

2018-05-13 17:03:08,989 ERROR tool.ImportTool: Import failed: java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[memory-mb] < 0 or greater than maximum allowed allocation. Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:2>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:4086, vCores:2>
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:208)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:542)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:390)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
         at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641)
         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
         at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345)
         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
         at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
         at org.apache.sqoop.manager.SQLServerManager.importQuery(SQLServerManager.java:405)
         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

If we increase the value of ‘yarn.nodemanager.resource.memory-mb’ to 2048, the issue will then be resolved since 1536 is less than 2048.

MapReduce related configurations

Documentation about MapReduce configurations in Hadoop 3.1.0

Refer to the following page about MapReduce available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Name Description
mapreduce.map.memory.mb

The amount of memory to request from the scheduler for each map task.

mapreduce.reduce.memory.mb

The amount of memory to request from the scheduler for each reduce task.

mapreduce.job.heap.memory-mb.ratio

The ratio of heap-size to container-size.

mapred.child.java.opts

Java opts for the task processes.

For example, a value ‘-Xmx1024m’  will limit the JVM heap size as 1024MB.

If this value is not set, the value is inferred from the fist two configurations in this table.

info Last modified by Raymond at 2 years ago * This page is subject to Site terms.

More from Kontext

local_offer hdfs local_offer hadoop local_offer windows

visibility 69
thumb_up 0
access_time 2 months ago

Network Attached Storage are commonly used in many enterprises where files are stored remotely on those servers.  They typically provide access to files using network file sharing protocols such as  ...

open_in_new View open_in_new Hadoop

Fix for Hadoop 3.2.1 namenode format issue on Windows 10

local_offer windows10 local_offer hadoop local_offer hdfs

visibility 254
thumb_up 0
access_time 3 months ago

Issue When installing Hadoop 3.2.1 on Windows 10,  you may encounter the following error when trying to format HDFS  namnode: ERROR namenode.NameNode: Failed to start namenode. The error happens when running the following comm...

open_in_new View open_in_new Hadoop

Compile and Build Hadoop 3.2.1 on Windows 10 Guide

local_offer windows10 local_offer hadoop

visibility 341
thumb_up 1
access_time 3 months ago

This article provides detailed steps about how to compile and build Hadoop (incl. native libs) on Windows 10. The following guide is based on Hadoop release 3.2.1. ...

open_in_new View open_in_new Hadoop

Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn

visibility 1084
thumb_up 3
access_time 3 months ago

This detailed step-by-step guide shows you how to install the latest Hadoop (v3.2.1) on Windows 10. It also provides a temporary fix for bug HDFS-14084 (java.lang.UnsupportedOperationException INFO).

open_in_new View open_in_new Hadoop

info About author

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward
Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us