By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .
close

Articles about Apache Hadoop installation, performance tuning and general tutorials.

rss_feed Subscribe RSS

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page:

https://wiki.apache.org/hadoop/Hadoop2OnWindows

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

Tools and Environment

  • GIT Bash
  • Command Prompt
  • Windows 10

Download Binary Package

Download the latest binary from the following site:

http://hadoop.apache.org/releases.html

In my case, I am saving the file to folder: F:\DataA nalytics

UnZip binary package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:

$ cd F:\DataAnalytics
fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf  hadoop-3.0.0.tar.gz

In my case, the Hadoop binary is extracted to: F:\DataAnalytics\hadoop-3.0.0

Setup environment variables

Make sure the following environment variables are set correctly:

  • JAVA_HOME: pointing to your Java JDK installation folder.
  • HADOOP_HOME: pointing to your Hadoop folder in the previous step.

image

Then add ‘%JAVA_HOME%/bin’ and ‘%HADOOP_HOME%/bin’ into Path environment variable like the following screenshot:

image

Verify your setup

You should be able to verify your settings via the following command:

F:\DataAnalytics\hadoop-3.0.0>hadoop -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

HDFS configurations

Edit file hadoop-env.cmd

Change this file in %HADOOP_HOME%/etc/hadoop directory to add the following lines at the end of file:

set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin

Edit file core-site.xml

Make sure the following configurations are existing:

<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://0.0.0.0:19000</value>
   </property> </configuration>

By default, the above property configuration doesn’t exist.

Edit file hdfs-site.xml

Make sure the following configurations are existing (you can change the file path to your own paths):

<configuration>
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
   <property>
     <name>dfs.name.dir</name>
     <value>file:///F:/DataAnalytics/dfs/namespace_logs</value>
   </property>
   <property>
     <name>dfs.data.dir</name>
     <value>file:///F:/DataAnalytics/dfs/data</value>
   </property> </configuration>

The above configurations setup the HFDS locations for storing namespace, logs and data files.

Edit file workers

Ensure the following content is existing:

localhost

YARN configurations

Edit file mapred-site.xml

Edit mapred-site.xml under %HADOOP_HOME%\etc\hadoop and add the following configuration, replacing %USERNAME% with your Windows user name.

<configuration>
   <property>
      <name>mapreduce.job.user.name</name>
      <value>%USERNAME%</value>
    </property>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
  <property>
     <name>yarn.apps.stagingDir</name>
     <value>/user/%USERNAME%/staging</value>
   </property>
  <property>
     <name>mapreduce.jobtracker.address</name>
     <value>local</value>
   </property>
</configuration>

Edit file yarn-site.xml

Make sure the following entries are existing:

<configuration>
   <property>
     <name>yarn.server.resourcemanager.address</name>
     <value>0.0.0.0:8020</value>
   </property>
  <property>
     <name>yarn.server.resourcemanager.application.expiry.interval</name>
     <value>60000</value>
   </property>
  <property>
     <name>yarn.server.nodemanager.address</name>
     <value>0.0.0.0:45454</value>
   </property>
  <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
   </property>
  <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
  <property>
     <name>yarn.server.nodemanager.remote-app-log-dir</name>
     <value>/app-logs</value>
   </property>
  <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value>/dep/logs/userlogs</value>
   </property>
  <property>
     <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
     <value>0.0.0.0</value>
   </property>
  <property>
     <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
     <value>0.0.0.0</value>
   </property>
  <property>
     <name>yarn.log-aggregation-enable</name>
     <value>true</value>
   </property>
  <property>
     <name>yarn.log-aggregation.retain-seconds</name>
     <value>-1</value>
   </property>
  <property>
     <name>yarn.application.classpath</name>
     <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
   </property> </configuration>

Initialize environment variables

Run hadoop-env.cmd to setup environment variables. For my case, the file path is:

%HADOOP_HOME%\etc\hadoop\hadoop-env.cmd

Format file system

Run the following command to format the file system:

hadoop namenode -format

The command should print out some logs like the following (the highlighted path may vary base on your HDFS configurations):

2018-02-18 21:29:41,501 INFO namenode.FSImage: Allocated new BlockPoolId: BP-353327356-172.24.144.1-1518949781495
2018-02-18 21:29:41,817 INFO common.Storage: Storage directory F:\DataAnalytics\dfs\namespace_logs has been successfully formatted.
2018-02-18 21:29:41,826 INFO namenode.FSImageFormatProtobuf: Saving image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 using no compression
2018-02-18 21:29:41,934 INFO namenode.FSImageFormatProtobuf: Image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 of size 390 bytes saved in 0 seconds.
2018-02-18 21:29:41,969 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

Start HDFS daemons

Run the following command to start the NameNode and DataNode on localhost.

%HADOOP_HOME%\sbin\start-dfs.cmd

The above command line will open two Command Prompt Windows: one for namenode and another for datanode.

image

To verify, let’s copy a file to HDFS:

%HADOOP_HOME%\bin\hdfs dfs -put file:///F:/DataAnalytics/test.txt /
And then list the files in HDFS:
%HADOOP_HOME%\bin\hdfs dfs -ls /

You should get some result similiar to the following screenshot:

image

Start YARN daemons

Start YARN through the following command:

%HADOOP_HOME%\sbin\start-yarn.cmd

Similar to HDFS, two windows will open:

image

To verify, we can run the following sample job to count word count:

%HADOOP_HOME%\bin\yarn jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.0.0.jar wordcount /test.txt /out

image

Web UIs

Resource manager

You can also view your job status through YRAN website. The default path is http://localhost:8088

imageimage

NameNode UI

Default URL: http://localhost:9870

image

image

DataNode UI

Through name node, you can find out all the data nodes. For my case, i only have single data node with UI URL as http://localhost:9864 

image

Errors and fixes

java.io.FileNotFoundException: Could not locate Hadoop executable: … \hadoop-3.0.0\bin\winutils.exe

Refer to the following page to fix the problem:

https://wiki.apache.org/hadoop/WindowsProblems

java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)

This error is the same as the above one.

Refer to ‘Windows binaries for Hadoop versions (built from the git commit ID used for the ASF relase) ‘

https://github.com/steveloughran/winutils 

For this example, I am using Hadoop 3.0.0.

https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin

To fix it, copy over the above directory to %HADOOP_HOME%/bin.

info Last modified by Raymond at 2 months ago * This page is subject to Site terms.

info About author

More from Kontext

local_offer hive local_offer hdfs

visibility 31
thumb_up 0
access_time 4 days ago

In Hive, there are two types of tables can be created - internal and external table. Internal tables are also called managed tables. Different features are available to different types. This article lists some of the common differences.&nbsp; Internal table By default, Hive creates ...

open_in_new View

Schema Merging (Evolution) with Parquet in Spark and Hive

local_offer parquet local_offer pyspark local_offer spark-2-x local_offer hive local_offer hdfs

visibility 67
thumb_up 0
access_time 24 days ago

Schema evolution is supported by many frameworks or data serialization systems such as Avro, Orc, Protocol Buffer and Parquet. With schema evolution, one set of data can be stored in multiple files with different but compatible schema. In Spark, Parquet data source can detect and merge schema ...

open_in_new View

local_offer windows10 local_offer hadoop local_offer hdfs

visibility 84
thumb_up 0
access_time 2 months ago

Issue When installing Hadoop 3.2.1 on Windows 10,&nbsp; you may encounter the following error when trying to format HDFS&nbsp; namnode: ERROR namenode.NameNode: Failed to start namenode. The error happens when running the following comm...

open_in_new View

Compile and Build Hadoop 3.2.1 on Windows 10 Guide

local_offer windows10 local_offer hadoop

visibility 170
thumb_up 1
access_time 2 months ago

This article provides detailed steps about how to compile and build Hadoop (incl. native libs) on Windows 10. The following guide is based on Hadoop release 3.2.1. ...

open_in_new View

comment Comments (29)

comment Add comment

Please log in or register to comment. account_circle Log in person_add Register
R
Raymondarrow_drop_down

Apologies for the late reply. Have you got your problem resolved?




format_quote

person U-7e9qo64lwkem90f8 access_time 6 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

@Raymond Tang I run in console. i did not do double click on the cmd file.
reply Reply
R
Raymondarrow_drop_down

There are many ways to do it:

  • hadoop fs commands to copy file from local to HDFS
  • Spark or any other frames that can talk with HDFS...
  • Sqoop (SQL to Hadoop)

format_quote

person Swati Agarwal access_time 5 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Hi there,

I have installed the Hadoop 3 as per instructions mentioned above. Please suggest the steps to load data in Hadoop through cmd in windows 10 and also to  perform operation on it.

Regards,

Swati

reply Reply
account_circle Swati Agarwal

Hi there,

I have installed the Hadoop 3 as per instructions mentioned above. Please suggest the steps to load data in Hadoop through cmd in windows 10 and also to  perform operation on it.

Regards,

Swati


reply Reply
A
Anonymousarrow_drop_down
@Raymond Tang I run in console. i did not do double click on the cmd file.

reply Reply
R
Raymondarrow_drop_down
When you run the cmd script, did you directly open the script file or run the command line in Command Prompt?
format_quote

person David Serrano access_time 6 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Hi,
I see your tutorial about the installation of hadoop on windows
However i am gettin this error when try to run the yarn demons with start-yarn.cmd:

This file does not have an app associated with it for performing this action. Please install an app or, if one is already installed, create an association in the defaul apps settings page.

Do you know some solution for that?

Thanks in advance.
reply Reply
account_circle David Serrano
Hi,
I see your tutorial about the installation of hadoop on windows
However i am gettin this error when try to run the yarn demons with start-yarn.cmd:

This file does not have an app associated with it for performing this action. Please install an app or, if one is already installed, create an association in the defaul apps settings page.

Do you know some solution for that?

Thanks in advance.

reply Reply
R
Raymondarrow_drop_down

Did you follow all the exact steps in my post? It seems like Java path (configured in environment variables) doesn't include some of the jar files Hadoop is using. However. it's hard to debug without access to your environment. 


format_quote

person XO3 ZDT access_time 6 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

I'm using JDK 8.

I tried setting yarn-nodemanager-opts and yarn-resourcemanager-opts  like the link you gave me but no luck the error is still there


reply Reply
account_circle XO3 ZDT

I'm using JDK 8.

I tried setting yarn-nodemanager-opts and yarn-resourcemanager-opts  like the link you gave me but no luck the error is still there



reply Reply
R
Raymondarrow_drop_down

Hi,

It seems your problem is similar like the following: https://issues.apache.org/jira/browse/HADOOP-14978

Are you using JDK9 or above?

Can you try with JDK 8? I have not tried with JDK 9 or above as it was not fully supported. It may work now but Java 8 is the one recommended from the official website. 


format_quote

person XO3 ZDT access_time 6 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

I can't start up the resource manager and node manager

i got the error: 

//////////////

 WARN webapp.WebAppContext: Failed startup of context o.e.j.w.WebAppContext@53830483{/,file:///C:/Users/ASUS/AppData/Local/Temp/jetty-0.0.0.0-8088-cluster-_-any-16441570173546812728.dir/webapp/,UNAVAILABLE}{/cluster}

com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Error injecting constructor, java.lang.NoClassDefFoundError: javax/activation/DataSource at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:41)

  at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebApp.setup(RMWebApp.java:54)

  while locating org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver

//////////////////////



reply Reply
account_circle XO3 ZDT

I can't start up the resource manager and node manager

i got the error: 

//////////////

 WARN webapp.WebAppContext: Failed startup of context o.e.j.w.WebAppContext@53830483{/,file:///C:/Users/ASUS/AppData/Local/Temp/jetty-0.0.0.0-8088-cluster-_-any-16441570173546812728.dir/webapp/,UNAVAILABLE}{/cluster}

com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Error injecting constructor, java.lang.NoClassDefFoundError: javax/activation/DataSource at org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:41)

  at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebApp.setup(RMWebApp.java:54)

  while locating org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver

//////////////////////




reply Reply
account_circle Mohammad

Thanks again. I finally installed hadoop on WLS but have problem on this article. in the step 

%HADOOP_HOME%\sbin\start-dfs.cmd

I get error "datanode.DataNode: Exception in secureMain" and there is no resolution over the web :(


reply Reply
R
Raymondarrow_drop_down

Hi, this post is about installing 3.0.0 and I have not tried 3.2.0 using this approach yet. Some configuration properties have changed. 

Your error message looks like that the Username was not configured correctly due to some reason. 

However, since you are using Windows 10, I'd suggest you to follow the post below to install 3.2.0 in Windows Subsystem for Linux (WSL). 

Install Hadoop 3.2.0 on Windows 10 using Windows Subsystem for Linux (WSL)


format_quote

person Muhammad Adnan access_time 10 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Hi,
I tried your guidelines for installing Hadoop 3.2.0 on my Windows 10 but unfortunately I am unable to start it. I am stuck at the point where you stated the command hadoop namenode -format . I am getting this message at the end of the command Shutting down NameNode at {Username}/192.168.0.200.
But no descriptive error other than that. Can you help me please?

reply Reply
account_circle Muhammad Adnan

Hi,
I tried your guidelines for installing Hadoop 3.2.0 on my Windows 10 but unfortunately I am unable to start it. I am stuck at the point where you stated the command hadoop namenode -format . I am getting this message at the end of the command Shutting down NameNode at {Username}/192.168.0.200.
But no descriptive error other than that. Can you help me please?


reply Reply
account_circle Exit Condition

Very well written guide. Thanks for posting this. Hadoop on Windows can be a daunting task. I have had several issues while I install Hadoop 2.9 on my machine. It encouraged me to document the working steps and ended up writing this Blog Post.

https://exitcondition.com/install-hadoop-windows/

Thanks anyway for writing this. It helps a lot of learners out there.


reply Reply
R
Raymondarrow_drop_down

Is JAVA_HOME environment variable configured correctly in your system? The value should point to your JDK location.


format_quote

person tharun access_time 10 months ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)


The system cannot find the path specified.

Error: JAVA_HOME is incorrectly set.

       Please update C:xxx\hadoop-2.9.1\etc\hadoop\hadoop-env.cmd

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

The system cannot find the path specified.

Error: JAVA_HOME is incorrectly set.

       Please update C:\xxxxxxx \etc\hadoop\hadoop-env.cmd


 Please update C:\xxxxxxx\hadoop-env.cmd


please help 



reply Reply
account_circle tharun


The system cannot find the path specified.

Error: JAVA_HOME is incorrectly set.

       Please update C:xxx\hadoop-2.9.1\etc\hadoop\hadoop-env.cmd

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

The system cannot find the path specified.

Error: JAVA_HOME is incorrectly set.

       Please update C:\xxxxxxx \etc\hadoop\hadoop-env.cmd


 Please update C:\xxxxxxx\hadoop-env.cmd


please help 




reply Reply
R
Raymondarrow_drop_down
Nw, I'm glad it worked. :)
reply Reply
A
Anonymousarrow_drop_down
I am sorry. It works!
reply Reply
A
Anonymousarrow_drop_down
I come again and sorry. When I run this commande %HADOOP_HOME%\bin\hdfs dfs -put file:///G:/DataAnalytics/test.txt / I get an error: put: `G:/DataAnalytics/test.txt': No such file or directory. But I followed the configuration step by step and my DataAnalytics folder is in G:.
reply Reply
A
Anonymousarrow_drop_down

Hi. thank you for your answer. I found a solution for my problem. If it can help someone, the problem was related to the syntax of my system username. It contains a space. So, to fixe it, you can edit /etc/hadoop/hadoop-env.cmd, at the end of this file, you will find set HADOOP_IDENT_STRING=%USERNAME% , change this with a string that you want but without space. For example: set HADOOP_IDENT_STRING=myuser, the problem will be fixed.


reply Reply
R
Raymondarrow_drop_down

Did you follow all the steps in this post? For example, you need to ensure winutils tool is installed: 

Overwrite your bin folder (%HADOOP_HOME%\bin) with the files from this link:

https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin

The current available version is 3.0.0 and I am not very sure whether it can fix the issue for 3.0.1 but worth giving it a try.


format_quote

person U-n5aypxvh9wpx8qwp access_time 2 years ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Good morning, I am trying to install hadoop 3.0.1 in my windows but when I want to test my configuration it gives me that error: Error: Can not find or load the main class. Can someone help me please?

Thank you

reply Reply
A
Anonymousarrow_drop_down

Good morning, I am trying to install hadoop 3.0.1 in my windows but when I want to test my configuration it gives me that error: Error: Can not find or load the main class. Can someone help me please?

Thank you


reply Reply
R
Raymondarrow_drop_down

@Neil S

No need to apology. I don't have one VM with Windows 7 so I could not verify for you.

I was about to suggest you to create one issue in the GitHub and then I just noticed you already did that.

https://github.com/steveloughran/winutils/issues/9 

Let's see how the author would response. 

The WinUtils project source code is available here:

https://github.com/apache/hadoop/tree/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-common-project/hadoop-common/src/main/winutils 

You may want to debug it in your system (if you have Visual Studio installed). More specifically, the source code for the command you are invoking is available here:

https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-common-project/hadoop-common/src/main/winutils/systeminfo.c

If I am using Windows 7, I could actually help you to debug.


reply Reply
account_circle Neil S

@Raymond Tang

I have to apologize, I missed the point that you mentioned Windows 10 as the work environment at the begining of the tutorial, so the tutorial is not applied to Windows 7 64 bit.

I'm really sorry about my mistake.


reply Reply
account_circle Neil S

@Raymond Tang

If you ran the command on Windows 10 and you didn't find any problems, then it doesn't contradict with what I wrote and the tutorial is relevant for Windows 10, but if the command "winutils systeminfo" returns the error when running on Windows7 64 bit then it may be beneficial noticing some requirenments in the tutorial that Hadoop 3.0.0 can only be installed on Windows 10 at the moment untill the issue with winutils.exe is not solved for Windows7 64 bit, due to the reason that winutils.exe for Hadoop 3.0.0 still doesn't on Windows7 64 bit.


reply Reply
R
Raymondarrow_drop_down

@Neil S

I have run the command in my system. I didn't get any issue.



reply Reply
account_circle Neil S

@Raymond Tang

I followed all the steps to setup. The problem is played back on Windows7 64bit. If you try issuing a command "winutils systeminfo" in the directory where winutils.exe for Hadoop 3.0.0 is located you will receive an error responce. This does not happen if you do the same with winutils for hadoop 2.6.4 or 2.7.x for instance but unfortunately this version of the executable file is not suitable for Hadoop 3.0.


reply Reply
account_circle Raymond Tang

@Neil S.

The machine I am using is Windows 10 64 bit and it is working properly all the time.

Did you follow all the steps to setup?


format_quote

person Neil S. access_time 2 years ago
Re: Install Hadoop 3.0.0 in Windows (Single Node)

Unfortunately the issue with the missing winutils.exe in the binary distributive of Hadoop 3.0.0 cannot be solved. The file windows.exe downloaded from the above mentioned mentioned resource causes an error when run with the parameter systeminfo on Windows7 64 bit. The text of the error is PdhAddCounter \Network Interface(*)\Bytes Received/Sec failed with 0xc0000bb8.
Error in GetDiskAndNetwork. Err:1

reply Reply
account_circle Neil S.

Unfortunately the issue with the missing winutils.exe in the binary distributive of Hadoop 3.0.0 cannot be solved. The file windows.exe downloaded from the above mentioned mentioned resource causes an error when run with the parameter systeminfo on Windows7 64 bit. The text of the error is PdhAddCounter \Network Interface(*)\Bytes Received/Sec failed with 0xc0000bb8.
Error in GetDiskAndNetwork. Err:1


reply Reply
Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us