This site uses cookies to deliver our services. By using this site, you acknowledge that you have read and understand our Cookie and Privacy policy. Your use of Kontext website is subject to this policy. Allow Cookies and Dismiss

Configure YARN and MapReduce Resources in Hadoop Cluster

127 views 0 comments last modified about 4 months ago Raymond Tang

hadoop yarn

When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly.

If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully.

For example, in the following post, MapReduce application failed because the submitted jobs needs more memory application compared with the maximum available.

Container is running beyond memory limits

Thus, it is critical to configure these figures correctly.

YARN related configurations

Documentation about YARN configurations in Hadoop 3.1.0

Refer to the following page about YARN available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Property Name Description
yarn.scheduler.minimum-allocation-mb

The minimum allocation of memory (MBs) for every container request at the Resource Manager.

If a  node manager that is configured to have less memory than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-mb

The maximum allocation of memory (MBs) for every container request at the Resource Manager.

Memory requests higher than this will throw an InvalidResourceRequestException.

yarn.scheduler.minimum-allocation-vcores

The minimum allocation of virtual CPU cores for every container request at the Resource Manager.

If a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.

yarn.scheduler.maximum-allocation-vcores

The maximum allocation of virtual CPU cores for every container request at the Resource Manager.

Requests higher than this will throw an InvalidResourceRequestException.

yarn.nodemanager.resource.memory-mb

Amount of physical memory (MBs), that can be allocated for containers.

If the value is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated in Windows and Linux.

The default value is 8GB.

yarn.nodemanager.vmem-pmem-ratio

Ratio between virtual memory to physical memory, which will be used to set memory limits for containers.

Container allocations are expressed in terms of physical memory; virtual memory usage is allowed to exceed this allocation by this ratio.

The default value is 2.1.

To summarize, all requests with values less than the minimum values will be set to the minimum values of the properties; others exceptions will be thrown out if the requested values are more than the maximum configured values.

Sample configuration in yarn-site.xml

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4086</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>2048</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

In the above configuration, the Resource Manager will allocate 526MB~4086MB memory for container requests.  In each node server, the physical memory is configured as 2048MB.

If a MapRequest application requests for 5000MB memory, InvalidResourceRequestException will be thrown; if it requests for 20500MB total memory for 20 map tasks while there are only 10 node managers, memory won’t be sufficient and errors will be thrown out.

Sample errors

If the site is configured with the following values, the application will fail if request memory for each container is greater than 1024MB.

<property>
       <name>yarn.scheduler.minimum-allocation-mb</name>
       <value>526</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-mb</name>
       <value>4096</value>
   </property>
   <property>
       <name>yarn.scheduler.minimum-allocation-vcores</name>
       <value>1</value>
   </property>
   <property>
       <name>yarn.scheduler.maximum-allocation-vcores</name>
       <value>2</value>
   </property>
   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>1024</value>
   </property>
   <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
     <value>4.1</value>
   </property>

For example, in the following error log, the application was requesting 1536MB memory from scheduler while the maximum allowed allocation is 1024MB in each node; in the meanwhile, the total memory is 1024*2(two nodes) = 2048, which is also less than the configured maximum 4086.

2018-05-13 17:03:08,989 ERROR tool.ImportTool: Import failed: java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[memory-mb] < 0 or greater than maximum allowed allocation. Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:1024, vCores:2>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:4086, vCores:2>
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:208)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:542)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:390)
         at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
         at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641)
         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
         at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

        at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345)
         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
         at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
         at org.apache.sqoop.manager.SQLServerManager.importQuery(SQLServerManager.java:405)
         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

If we increase the value of ‘yarn.nodemanager.resource.memory-mb’ to 2048, the issue will then be resolved since 1536 is less than 2048.

MapReduce related configurations

Documentation about MapReduce configurations in Hadoop 3.1.0

Refer to the following page about MapReduce available configuration entries.

https://hadoop.apache.org/docs/r3.1.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Resources related properties

The following are the critical configurations about memory and virtual CPU cores.

Name Description
mapreduce.map.memory.mb

The amount of memory to request from the scheduler for each map task.

mapreduce.reduce.memory.mb

The amount of memory to request from the scheduler for each reduce task.

mapreduce.job.heap.memory-mb.ratio

The ratio of heap-size to container-size.

mapred.child.java.opts

Java opts for the task processes.

For example, a value ‘-Xmx1024m’  will limit the JVM heap size as 1024MB.

If this value is not set, the value is inferred from the fist two configurations in this table.

Related pages

Install Hadoop 3.0.0 in Windows (Single Node)

3475 views   14 comments last modified about 6 months ago

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page: https://wiki.apache.org/hadoop/Hadoop2OnWindows ...

View detail

Resolve Hadoop RemoteException - Name node is in safe mode

84 views   0 comments last modified about 4 months ago

In Safe Mode, the HDFS cluster is read-only. After completion of block replication maintenance activity, the name node leaves safe mode automatically. If you try to delete files in safe mode, the following exception may raise: org.apache.hadoop.ipc.RemoteException(org.apac...

View detail

Configure Sqoop in a Edge Node of Hadoop Cluster

329 views   0 comments last modified about 4 months ago

This page continues with the following documentation about configuring a Hadoop multi-nodes cluster via adding a new edge node to configure administration or client tools. ...

View detail

Configure Hadoop 3.1.0 in a Multi Node Cluster

1489 views   0 comments last modified about 4 months ago

Previously, I summarized the steps to install Hadoop in a single node Windows machine. Install Hadoop 3.0.0 in Windows (Single Node) In this page, I...

View detail

Install Big Data Tools (Spark, Zeppelin, Hadoop) in Windows for Learning and Practice

462 views   2 comments last modified about 4 months ago

Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...

View detail

Default Ports Used by Hadoop Services (HDFS, MapReduce, YARN)

505 views   0 comments last modified about 4 months ago

This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...

View detail

Add comment

Please login first to add comments.  Log in New user?  Register

Comments (0)

No comments yet.