Configure YARN and MapReduce Resources in Hadoop Cluster

access_time 3 years ago visibility2976 comment 0

When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly.

If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully.

For example, in the following post, MapReduce application failed because the submitted jobs needs more memory application compared with the maximum available.

Container is running beyond memory limits

Thus, it is critical to configure these figures correctly.

YARN related configurations

Documentation about YARN configurations in Hadoop 3.1.0

Refer to the following page about YARN available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Property Name Description
yarn.scheduler.minimum-allocation-mb

The minimum allocation of memory (MBs) for every container request at the Resource Manager.

If a  node manager that is configured to have less memory than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-mb

The maximum allocation of memory (MBs) for every container request at the Resource Manager.

Memory requests higher than this will throw an InvalidResourceRequestException.

yarn.scheduler.minimum-allocation-vcores

The minimum allocation of virtual CPU cores for every container request at the Resource Manager.

If a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-vcores

The maximum allocation of virtual CPU cores for every container request at the Resource Manager.

Requests higher than this will throw an InvalidResourceRequestException.

yarn.nodemanager.resource.memory-mb

Amount of physical memory (MBs), that can be allocated for containers.

If the value is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated in Windows and Linux.

The default value is 8GB.

yarn.nodemanager.vmem-pmem-ratio

Ratio between virtual memory to physical memory, which will be used to set memory limits for containers.

Container allocations are expressed in terms of physical memory; virtual memory usage is allowed to exceed this allocation by this ratio.

The default value is 2.1.

To summarize, all requests with values less than the minimum values will be set to the minimum values of the properties; others exceptions will be thrown out if the requested values are more than the maximum configured values.

Sample configuration in yarn-site.xml

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4086</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>2048</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

In the above configuration, the Resource Manager will allocate 526MB~4086MB memory for container requests.  In each node server, the physical memory is configured as 2048MB.

If a MapRequest application requests for 5000MB memory, InvalidResourceRequestException will be thrown; if it requests for 20500MB total memory for 20 map tasks while there are only 10 node managers, memory won’t be sufficient and errors will be thrown out.

Sample errors

If the site is configured with the following values, the application will fail if request memory for each container is greater than 1024MB.

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4096</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>1024</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

For example, in the following error log, the application was requesting 1536MB memory from scheduler while the maximum allowed allocation is 1024MB in each node; in the meanwhile, the total memory is 1024*2(two nodes) = 2048, which is also less than the configured maximum 4086.

2018-05-13 17:03:08,989 ERROR tool.ImportTool: Import failed: java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[memory-mb] < 0 or greater than maximum allowed allocation. Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:2>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:4086, vCores:2>
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:208)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:542)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:390)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
         at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641)
         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
         at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345)
         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
         at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
         at org.apache.sqoop.manager.SQLServerManager.importQuery(SQLServerManager.java:405)
         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

If we increase the value of ‘yarn.nodemanager.resource.memory-mb’ to 2048, the issue will then be resolved since 1536 is less than 2048.

MapReduce related configurations

Documentation about MapReduce configurations in Hadoop 3.1.0

Refer to the following page about MapReduce available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Name Description
mapreduce.map.memory.mb

The amount of memory to request from the scheduler for each map task.

mapreduce.reduce.memory.mb

The amount of memory to request from the scheduler for each reduce task.

mapreduce.job.heap.memory-mb.ratio

The ratio of heap-size to container-size.

mapred.child.java.opts

Java opts for the task processes.

For example, a value ‘-Xmx1024m’  will limit the JVM heap size as 1024MB.

If this value is not set, the value is inferred from the fist two configurations in this table.

info Last modified by Administrator at 3 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer hdfs local_offer hadoop local_offer windows10

visibility 918
thumb_up 0
access_time 8 months ago

Network Attached Storage are commonly used in many enterprises where files are stored remotely on those servers.  They typically provide access to files using network file sharing protocols such as  NFS ,  SMB , or  AFP .  In some cases, you may want to ingest these ...

Apache Hive 3.1.2 Installation on Windows 10

local_offer hive local_offer hadoop local_offer windows10 local_offer big-data-on-windows-10

visibility 647
thumb_up 0
access_time 3 months ago

Hive 3.1.2 was released on 26th Aug 2019. It is still the latest 3.x release and works with Hadoop 3.x.y releases. In this article, I’m going to provide step by step instructions about installing Hive 3.1.2 on Windows 10. * Logos are registered trademarks of Apache Hive and Microsoft Windows.

local_offer linux local_offer hadoop local_offer hdfs local_offer yarn local_offer big-data-on-linux

visibility 1103
thumb_up 0
access_time 3 months ago

This article provides step-by-step guidance to install Hadoop 3.3.0 on Linux such as Debian, Ubuntu, Red Hat, openSUSE, etc.  Hadoop 3.3.0 was released on July 14 2020. It is the first release of Apache Hadoop 3.3 line. There are significant changes compared with Hadoop 3.2.0, such as ...

About column

Articles about Apache Hadoop installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop.

rss_feed Subscribe RSS