When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly.

If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully.

For example, in the following post, MapReduce application failed because the submitted jobs needs more memory application compared with the maximum available.

Container is running beyond memory limits

Thus, it is critical to configure these figures correctly.

YARN related configurations

Documentation about YARN configurations in Hadoop 3.1.0

Refer to the following page about YARN available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Property Name Description
yarn.scheduler.minimum-allocation-mb

The minimum allocation of memory (MBs) for every container request at the Resource Manager.

If a  node manager that is configured to have less memory than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-mb

The maximum allocation of memory (MBs) for every container request at the Resource Manager.

Memory requests higher than this will throw an InvalidResourceRequestException.

yarn.scheduler.minimum-allocation-vcores

The minimum allocation of virtual CPU cores for every container request at the Resource Manager.

If a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-vcores

The maximum allocation of virtual CPU cores for every container request at the Resource Manager.

Requests higher than this will throw an InvalidResourceRequestException.

yarn.nodemanager.resource.memory-mb

Amount of physical memory (MBs), that can be allocated for containers.

If the value is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated in Windows and Linux.

The default value is 8GB.

yarn.nodemanager.vmem-pmem-ratio

Ratio between virtual memory to physical memory, which will be used to set memory limits for containers.

Container allocations are expressed in terms of physical memory; virtual memory usage is allowed to exceed this allocation by this ratio.

The default value is 2.1.

To summarize, all requests with values less than the minimum values will be set to the minimum values of the properties; others exceptions will be thrown out if the requested values are more than the maximum configured values.

Sample configuration in yarn-site.xml

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4086</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>2048</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

In the above configuration, the Resource Manager will allocate 526MB~4086MB memory for container requests.  In each node server, the physical memory is configured as 2048MB.

If a MapRequest application requests for 5000MB memory, InvalidResourceRequestException will be thrown; if it requests for 20500MB total memory for 20 map tasks while there are only 10 node managers, memory won’t be sufficient and errors will be thrown out.

Sample errors

If the site is configured with the following values, the application will fail if request memory for each container is greater than 1024MB.

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4096</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>1024</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

For example, in the following error log, the application was requesting 1536MB memory from scheduler while the maximum allowed allocation is 1024MB in each node; in the meanwhile, the total memory is 1024*2(two nodes) = 2048, which is also less than the configured maximum 4086.

2018-05-13 17:03:08,989 ERROR tool.ImportTool: Import failed: java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[memory-mb] < 0 or greater than maximum allowed allocation. Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:2>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:4086, vCores:2>
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:208)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:542)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:390)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
         at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641)
         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
         at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345)
         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
         at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
         at org.apache.sqoop.manager.SQLServerManager.importQuery(SQLServerManager.java:405)
         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

If we increase the value of ‘yarn.nodemanager.resource.memory-mb’ to 2048, the issue will then be resolved since 1536 is less than 2048.

MapReduce related configurations

Documentation about MapReduce configurations in Hadoop 3.1.0

Refer to the following page about MapReduce available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Name Description
mapreduce.map.memory.mb

The amount of memory to request from the scheduler for each map task.

mapreduce.reduce.memory.mb

The amount of memory to request from the scheduler for each reduce task.

mapreduce.job.heap.memory-mb.ratio

The ratio of heap-size to container-size.

mapred.child.java.opts

Java opts for the task processes.

For example, a value ‘-Xmx1024m’  will limit the JVM heap size as 1024MB.

If this value is not set, the value is inferred from the fist two configurations in this table.


info Last modified by Raymond at 3 years ago copyright This page is subject to Site terms.

More from Kontext

local_offer linux local_offer hadoop local_offer hdfs local_offer yarn

visibility 48
thumb_up 0
access_time 5 days ago

This article provides step-by-step guidance to install Hadoop 3.3.0 on Linux such as Debian, Ubuntu, Red Hat, openSUSE, etc.  Hadoop 3.3.0 was released on July 14 2020. It is the first release of Apache Hadoop 3.3...

open_in_new Hadoop

Install Hadoop 3.3.0 on Windows 10 Step by Step Guide

local_offer windows10 local_offer hadoop local_offer yarn local_offer hdfs

visibility 186
thumb_up 0
access_time 7 days ago

This detailed step-by-step guide shows you how to install the latest Hadoop v3.3.0 on Windows 10. It leverages Hadoop 3.3.0 winutils tool and WSL is not required. This version was released on July 14 2020. It is the first release of Apache Hadoop 3.3 line. There are significant changes compared with Hadoop 3.2.0, such as Java 11 runtime support, protobuf upgrade to 3.7.1, scheduling of opportunistic containers, non-volatile SCM support in HDFS cache directives, etc.

open_in_new Hadoop

local_offer hadoop local_offer windows10

visibility 47
thumb_up 0
access_time 7 days ago

Winutils is required when installing Hadoop on Windows environment. Hadoop 3.3.0 winutils I've compiled Hadoop 3.3.0 on Windows 10 using CMake and Visual Studio (MSVC x64). Follow these two steps to download it: ...

open_in_new Hadoop

Install Hadoop 3.3.0 on Windows 10 using WSL

local_offer linux local_offer hadoop local_offer WSL

visibility 75
thumb_up 0
access_time 8 days ago

Hadoop 3.3.0 was released on July 14 2020. It is the first release of Apache Hadoop 3.3 line. There are significant changes compared with Hadoop 3.2.0, such as Java 11 runtime support, protobuf upgrade to 3.7.1, scheduling of opportunistic containers, non-volatile SCM support in HDFS ca...

open_in_new Hadoop

comment Comments (0)

comment Add comment

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

No comments yet.

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward