arrow_back Install Hadoop 3.2.1 on Windows 10 Step by Step Guide

comment Comments
Raymond Raymond #343 access_time 4 years ago more_vert

I'm glad it helped you. For the issue you mentioned, I don't think it is related to this tutorial. If you follow all the steps in this tutorial, you will not get that issue as this has been tested out by quite a few different people. I've also tested the steps in a new Windows 10 environment too. 

It seems you may have mixed different versions of Hadoop libraries when doing the installation.

This installation guide is only for Hadoop 3.2.1. Can you confirm whether you exactly followed all the steps in this guide and also was using Hadoop 3.2.1 release for installation?


format_quote

person Ankit access_time 4 years ago

Thank you so much.

I was struggling for Hadoop installation on my Windows, but your article helped me install it properly.

Just faced a small issue with resource manager, which I fixed by referring to the following link.

If possible, could you please update your blog with this entry also, that should make your article exhaustive :)

Thanks again :)

https://stackoverflow.com/questions/51118358/noclassdeffounderror-org-apache-hadoop-yarn-server-timelineservice-collector-tim


A Ankit Tiwari #342 access_time 4 years ago more_vert

Thank you so much.

I was struggling for Hadoop installation on my Windows, but your article helped me install it properly.

Just faced a small issue with resource manager, which I fixed by referring to the following link.

If possible, could you please update your blog with this entry also, that should make your article exhaustive :)

Thanks again :)

https://stackoverflow.com/questions/51118358/noclassdeffounderror-org-apache-hadoop-yarn-server-timelineservice-collector-tim


A Andika Syahputra #326 access_time 4 years ago more_vert

Alright. Cool. Thanks for replying.

format_quote

person Raymond access_time 4 years ago

Further update:

I could not compile all Hadoop 3.3.0 projects on Windows 10 but I can compile the winutils project successfully. 

You can follow this guide to install Hadoop 3.3.0 on Windows 10 without using WSL:

Install Hadoop 3.3.0 on Windows 10 Step by Step Guide


Raymond Raymond #323 access_time 4 years ago more_vert

Further update:

I could not compile all Hadoop 3.3.0 projects on Windows 10 but I can compile the winutils project successfully. 

You can follow this guide to install Hadoop 3.3.0 on Windows 10 without using WSL:

Install Hadoop 3.3.0 on Windows 10 Step by Step Guide


format_quote

person Andika access_time 4 years ago

Hey, I saw an article about how to install Hadoop 3.3.0 on Windows in this website a couple days ago. But now it's gone. Was it deleted or is there a way for me to find it?

Raymond Raymond #322 access_time 4 years ago more_vert

Hi Andika,

I have just published the WSL version for Hadoop 3.3.0 installation on Windows 10:

Install Hadoop 3.3.0 on Windows 10 using WSL

The one you mentioned is about building Hadoop 3.3.0 on Windows 10. I have temporarily deleted it as I found some issues of building HDFS C/C++ project using CMake. 

I'm still working on that and will publish the guide once I resolve the issues. There were some unexpected issues that I need to fix before I republish it. 

Please stay tuned.

BTW, the Hadoop 3.2.1 build instructions are fully tested if you want to test something now. 


Regards,

Raymond

format_quote

person Andika access_time 4 years ago

Hey, I saw an article about how to install Hadoop 3.3.0 on Windows in this website a couple days ago. But now it's gone. Was it deleted or is there a way for me to find it?

A Andika Syahputra #321 access_time 4 years ago more_vert

Hey, I saw an article about how to install Hadoop 3.3.0 on Windows in this website a couple days ago. But now it's gone. Was it deleted or is there a way for me to find it?

Raymond Raymond #314 access_time 4 years ago more_vert

Hi Tim,

Just an update the previous issue (understanding you have fixed it but I'd like to post my findings here too just in case other people may be interested).

I've done the following steps to see if I can run Hadoop daemons without Administrator right.

  • Create a local computer account named hadoop.
  • Setup environment variables for this account.


  • Reconfigured HDFS dfs locations for both data and namespace.
  • Format the namenode again using this local account.
hadoop namenode -format
  • Start HDFS daemons
start-dfs.cmd

Commands can start successfully without any errors.


  • Start YARN daemons
start-yarn.cmd

Very interestingly, this time NodeManager can start successfully while ResourceManager cannot due to the following error:


org.apache.hadoop.service.ServiceStateException: java.io.IOException: Mkdirs failed to create file:/tmp/hadoop-yarn-hadoop/node-attribute

For YARN tmp folder, I am configuring it as the following:

<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>file:///F:/tmp</value>
	</property>

So I then tried the following steps:

  • Stopped all the running Hadoop daemons.
  • Delete the existing tmp folder and recreate it using hadoop local account:


  • Delete DFS folder and recreate it


  • Reformat namenode
  • Restarted HDFS: the services were started successfully as the following screenshot shows.


  • Start YARN daemons:

This time the services all started successfully without any errors.

I can verify that through resource manager UI too:

So to summarize:

  • You don't necessarily need to create the tmp folder under your user directory.
  • And you can run Hadoop services without Administrator privileges on Windows as long as the HDFS directories and also tmp directories are setup correctly using the Windows account that runs Hadoop daemons.

Hope the above helps.

format_quote

person Tim access_time 4 years ago

Hello,

I have been able to get around my need for admin it seems so far by changing my config so the tmp-nm folder is in my Documents versus in C drive directly in tmp.

However, it seems I still have some issues.   Two of them seem to point to wrong version of winutils.exe.   I am running windows 10 64 bit and am trying to get hadoop 3.2.1 running. One symtom of the wrong version is the repeated warning in Yarn node manager window over and over

WARN util.SysInfoWindows: Expected split length of sysInfo to be 11. Got 7

Another was the failure code of a job I submitted to insert data into a table from the hive prompt.  Job details were found in the Hadoop cluster local UI 

Application application_1589548856723_0001 failed 2 times due to AM Container for appattempt_1589548856723_0001_000002 exited with exitCode: 1639

Failing this attempt.Diagnostics: [2020-05-15 09:53:23.804]Exception from container-launch.

Container id: container_1589548856723_0001_02_000001

Exit code: 1639

Exception message: Incorrect command line arguments.

Shell output: Usage: task create [TASKNAME] [COMMAND_LINE] |

task isAlive [TASKNAME] |

task kill [TASKNAME]

task processList [TASKNAME]

Creates a new task jobobject with taskname

Checks if task jobobject is alive

Kills task jobobject

Prints to stdout a list of processes in the task

along with their resource usage. One process per line

and comma separated info per process

ProcessId,VirtualMemoryCommitted(bytes),

WorkingSetSize(bytes),CpuTime(Millisec,Kernel+User)

[2020-05-15 09:53:23.831]Container exited with a non-zero exit code 1639.


Some sites have said these two issues are symtom of having the wrong winutils.exe.

I have some other issues I'll wait to post after I can get these fixed.

I have used the link in this article to get winutils.exe.    I have also tried other winutils.exe's I find out there.  However, for the other ones I've tried when trying to start yarn, in the yarn node manager window it is full of errors like

2020-05-15 10:12:16,444 ERROR util.SysInfoWindows: java.io.IOException: Cannot run program "C:\Users\XXX\Documents\Big-Data\Hadoop\hadoop-3.2.1\bin\winutils.exe": CreateProcess error=216, This version of %1 is not compatible with the version of Windows you're running. Check your computer's system information and then contact the software publisher

So those ones are worse - I can't even get yarn started with those due to that error.  

So with the version I am using now I can get YARN to start although I get the warning about "WARN util.SysInfoWindows: Expected split length of sysInfo to be 11. Got 7" but the actual hive insert fails anyway... 

Appreciate the help.  How do I find or know if a winutil.exe is meant for windows 10 64 bit and Hadoop 3.2.1?

Raymond Raymond #313 access_time 4 years ago more_vert

Hi Tim,

In my computer, all the paths for HADDOP_HOME and JAVA_HOME are configured to a location without any space as I was worried that the spaces issue may cause problems in the applications.

That's also the reasons that most of Windows Hadoop installation guides recommend configuring them in a path that has no space. This is even more important for Hive installation.  

So I think you are right the issue was due to the space in your environment variables.

JAVA_HOME environment variable is setup in the following folder:

%HADOOP_HOME%\etc\hadoop\hadoop-env.cmd

And also in Step 6 of this page, we've added class paths for JARs:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property> 
        <name>mapreduce.application.classpath</name>
        <value>%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*</value>
    </property>
</configuration>

You can try to change them to absolute values with double quotes to see if they work. 

To save all the troubles, I would highly recommend getting Java available in a path without space or create a symbolic link to the Java folder in a location without space in the path.

format_quote

person Tim access_time 4 years ago

Hi Raymond,

Ok I got my work IT helpdesk to add my ID to the Create Symbolic Links directory. That worked fine. So I am now passed the exitCode=1: CreateSymbolicLink error (1314): A required privilege is not held by the client. error.

Now, it is throwing this error:

Application application_1589579676240_0001 failed 2 times due to AM Container for appattempt_1589579676240_0001_000002 exited with exitCode: 1

Failing this attempt.Diagnostics: [2020-05-15 17:57:08.681]Exception from container-launch.

Container id: container_1589579676240_0001_02_000001

Exit code: 1

Shell output: 1 file(s) moved.

"Setting up env variables"

"Setting up job resources"

"Copying debugging information"

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>rem Creating copy of launch script

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>copy "launch_container.cmd" "C:/Users/V121119/Documents/Big-Data/Hadoop/hadoop-3.2.1/logs/userlogs/application_1589579676240_0001/container_1589579676240_0001_02_000001/launch_container.cmd"

1 file(s) copied.

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>rem Determining directory contents

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>dir 1>>"C:/Users/XXX/Documents/Big-Data/Hadoop/hadoop-3.2.1/logs/userlogs/application_1589579676240_0001/container_1589579676240_0001_02_000001/directory.info"

"Launching container"

[2020-05-15 17:57:08.696]Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :

'C:\Program' is not recognized as an internal or external command,

operable program or batch file.

[2020-05-15 17:57:08.696]Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :

'C:\Program' is not recognized as an internal or external command,

operable program or batch file.


I think I know what is happening but don't know how to fix. One of my first errors was in hadoop.config.xml where it was trying to check if not exist %JAVA_HOME%\bin\java.exe but the problem is in path : C:\program files\Java\jre8  -Please note this directory "program files" has a space in it. The result was that I had to modify hadoop.config.xml  to put double quotes around the check - making it 

if not exist "%JAVA_HOME%\bin\java.exe" .  This resolved that problem.  Then another place I had found issue was on this line

for /f "delims=" %%A in ('%JAVA% -Xmx32m %HADOOP_JAVA_PLATFORM_OPTS% -classpath "%CLASSPATH%" org.apache.hadoop.util.PlatformName') do set JAVA_PLATFORM=%%A

This was messing up too for similar reason - it was erroring with similar error since some values in the classpath list were C:\program files\...  and once it hit the space it blew up.  For this line of code I just remarked it - since my HADOOP_JAVA_PLATFORM_OPTS is empty - I am not sure what I would have done had HADOOP_JAVA_PLATFORM_OPTS been populated. In any case, these were preliminary issues and all dealt with the fact that C:\program files... path was causing issues due to space.  Therefore when I saw this latest exception, I am assuming it too is hitting this at java path or some member of classpath that has the same... but not sure where to modify - how to work around.

As of 5/15 6:10pm EDT - this is my current issue - you may disregard the prior comments if you wish since they are resolved...   Thanks

T Tim Reynolds #312 access_time 4 years ago more_vert

Hi Raymond,

Ok I got my work IT helpdesk to add my ID to the Create Symbolic Links directory. That worked fine. So I am now passed the exitCode=1: CreateSymbolicLink error (1314): A required privilege is not held by the client. error.

Now, it is throwing this error:

Application application_1589579676240_0001 failed 2 times due to AM Container for appattempt_1589579676240_0001_000002 exited with exitCode: 1

Failing this attempt.Diagnostics: [2020-05-15 17:57:08.681]Exception from container-launch.

Container id: container_1589579676240_0001_02_000001

Exit code: 1

Shell output: 1 file(s) moved.

"Setting up env variables"

"Setting up job resources"

"Copying debugging information"

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>rem Creating copy of launch script

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>copy "launch_container.cmd" "C:/Users/V121119/Documents/Big-Data/Hadoop/hadoop-3.2.1/logs/userlogs/application_1589579676240_0001/container_1589579676240_0001_02_000001/launch_container.cmd"

1 file(s) copied.

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>rem Determining directory contents

C:\Users\V121119\Documents\Big-Data\tmp-nm\usercache\XXX\appcache\application_1589579676240_0001\container_1589579676240_0001_02_000001>dir 1>>"C:/Users/XXX/Documents/Big-Data/Hadoop/hadoop-3.2.1/logs/userlogs/application_1589579676240_0001/container_1589579676240_0001_02_000001/directory.info"

"Launching container"

[2020-05-15 17:57:08.696]Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :

'C:\Program' is not recognized as an internal or external command,

operable program or batch file.

[2020-05-15 17:57:08.696]Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :

'C:\Program' is not recognized as an internal or external command,

operable program or batch file.


I think I know what is happening but don't know how to fix. One of my first errors was in hadoop.config.xml where it was trying to check if not exist %JAVA_HOME%\bin\java.exe but the problem is in path : C:\program files\Java\jre8  -Please note this directory "program files" has a space in it. The result was that I had to modify hadoop.config.xml  to put double quotes around the check - making it 

if not exist "%JAVA_HOME%\bin\java.exe" .  This resolved that problem.  Then another place I had found issue was on this line

for /f "delims=" %%A in ('%JAVA% -Xmx32m %HADOOP_JAVA_PLATFORM_OPTS% -classpath "%CLASSPATH%" org.apache.hadoop.util.PlatformName') do set JAVA_PLATFORM=%%A

This was messing up too for similar reason - it was erroring with similar error since some values in the classpath list were C:\program files\...  and once it hit the space it blew up.  For this line of code I just remarked it - since my HADOOP_JAVA_PLATFORM_OPTS is empty - I am not sure what I would have done had HADOOP_JAVA_PLATFORM_OPTS been populated. In any case, these were preliminary issues and all dealt with the fact that C:\program files... path was causing issues due to space.  Therefore when I saw this latest exception, I am assuming it too is hitting this at java path or some member of classpath that has the same... but not sure where to modify - how to work around.

As of 5/15 6:10pm EDT - this is my current issue - you may disregard the prior comments if you wish since they are resolved...   Thanks

T Tim Reynolds #311 access_time 4 years ago more_vert

Hello, sorry for the blow by blow here but I can't find a way to update existing comment.

I did find the correct version of winutils.exe that solves the two problems regarding both the WARN util.SysInfoWindows: Expected split length of sysInfo to be 11. Got 7

as well as the Exit code: 1639 Exception message: Incorrect command line arguments. issues. 

(By the way that winutils version is found here: https://github.com/cdarlint/winutils/blob/master/hadoop-3.2.1/bin/winutils.exe)

So now I'm  back to the insert in hive - when I try to run it my job fails with Exit code: 1 Exception message: CreateSymbolicLink error (1314): A required privilege is not held by the client.  in my yarn resource manager I can find Container id: container_1589558742717_0001_02_000001 Exit code: 1Exception message: CreateSymbolicLink error (1314): A required privilege is not held by the client. as well. In my yarn node manager window I can see the same error as well as '2020-05-15 12:12:35,690 WARN nodemanager.NMAuditLogger: USER=myid    OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE    APPID=application_1589558742717_0001    CONTAINERID=container_1589558742717_0001_01_000001'

I am hoping I do not need admin permissions to run and hoping perhaps some config entry can help me get past.
Ideas?

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts