Read here about Kontext's Cookie and Privacy policy. Dismiss

Install Hadoop 3.0.0 in Windows (Single Node)

1067 views 8 comments last modified about 4 months ago Raymond Tang

hadoop yarn hdfs

In this page

This page summarizes the steps to install Hadoop 3.0.0 in your Windows environment. Reference page:

https://wiki.apache.org/hadoop/Hadoop2OnWindows

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

Tools and Environment

  • GIT Bash
  • Command Prompt
  • Windows 10

Download Binary Package

Download the latest binary from the following site:

http://hadoop.apache.org/releases.html

In my case, I am saving the file to folder: F:\DataA nalytics

UnZip binary package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:

$ cd F:\DataAnalytics

fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf  hadoop-3.0.0.tar.gz

In my case, the Hadoop binary is extracted to: F:\DataAnalytics\hadoop-3.0.0

Setup environment variables

Make sure the following environment variables are set correctly:

  • JAVA_HOME: pointing to your Java JDK installation folder.
  • HADOOP_HOME: pointing to your Hadoop folder in the previous step.

image

Then add ‘%JAVA_HOME%/bin’ and ‘%HADOOP_HOME%/bin’ into Path environment variable like the following screenshot:

image

Verify your setup

You should be able to verify your settings via the following command:

F:\DataAnalytics\hadoop-3.0.0>hadoop -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

HDFS configurations

Edit file hadoop-env.cmd

Change this file in %HADOOP_HOME%/etc/hadoop directory to add the following lines at the end of file:

set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin

Edit file core-site.xml

Make sure the following configurations are existing:

<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://0.0.0.0:19000</value>
   </property>
</configuration>

By default, the above property configuration doesn’t exist.

Edit file hdfs-site.xml

Make sure the following configurations are existing (you can change the file path to your own paths):

<configuration>
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
   <property>
     <name>dfs.name.dir</name>
     <value>file:///F:/DataAnalytics/dfs/namespace_logs</value>
   </property>
   <property>
     <name>dfs.data.dir</name>
     <value>file:///F:/DataAnalytics/dfs/data</value>
   </property>
</configuration>

The above configurations setup the HFDS locations for storing namespace, logs and data files.

Edit file workers

Ensure the following content is existing:

localhost

YARN configurations

Edit file mapred-site.xml

Edit mapred-site.xml under %HADOOP_HOME%\etc\hadoop and add the following configuration, replacing %USERNAME% with your Windows user name.

<configuration>

   <property>
      <name>mapreduce.job.user.name</name>
      <value>%USERNAME%</value>
    </property>

   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>

  <property>
     <name>yarn.apps.stagingDir</name>
     <value>/user/%USERNAME%/staging</value>
   </property>

  <property>
     <name>mapreduce.jobtracker.address</name>
     <value>local</value>
   </property>

</configuration>

Edit file yarn-site.xml

Make sure the following entries are existing:

<configuration>
   <property>
     <name>yarn.server.resourcemanager.address</name>
     <value>0.0.0.0:8020</value>
   </property>

  <property>
     <name>yarn.server.resourcemanager.application.expiry.interval</name>
     <value>60000</value>
   </property>

  <property>
     <name>yarn.server.nodemanager.address</name>
     <value>0.0.0.0:45454</value>
   </property>

  <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
   </property>

  <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>

  <property>
     <name>yarn.server.nodemanager.remote-app-log-dir</name>
     <value>/app-logs</value>
   </property>

  <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value>/dep/logs/userlogs</value>
   </property>

  <property>
     <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
     <value>0.0.0.0</value>
   </property>

  <property>
     <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
     <value>0.0.0.0</value>
   </property>

  <property>
     <name>yarn.log-aggregation-enable</name>
     <value>true</value>
   </property>

  <property>
     <name>yarn.log-aggregation.retain-seconds</name>
     <value>-1</value>
   </property>

  <property>
     <name>yarn.application.classpath</name>
     <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
   </property>
</configuration>

Initialize environment variables

Run hadoop-env.cmd to setup environment variables. For my case, the file path is:

%HADOOP_HOME%\etc\hadoop\hadoop-env.cmd

Format file system

Run the following command to format the file system:

hadoop namenode -format

The command should print out some logs like the following (the highlighted path may vary base on your HDFS configurations):

2018-02-18 21:29:41,501 INFO namenode.FSImage: Allocated new BlockPoolId: BP-353327356-172.24.144.1-1518949781495
2018-02-18 21:29:41,817 INFO common.Storage: Storage directory F:\DataAnalytics\dfs\namespace_logs has been successfully formatted.
2018-02-18 21:29:41,826 INFO namenode.FSImageFormatProtobuf: Saving image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 using no compression
2018-02-18 21:29:41,934 INFO namenode.FSImageFormatProtobuf: Image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 of size 390 bytes saved in 0 seconds.
2018-02-18 21:29:41,969 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

Start HDFS daemons

Run the following command to start the NameNode and DataNode on localhost.

%HADOOP_HOME%\sbin\start-dfs.cmd

The above command line will open two Command Prompt Windows: one for namenode and another for datanode.

image

To verify, let’s copy a file to HDFS:

%HADOOP_HOME%\bin\hdfs dfs -put file:///F:/DataAnalytics/test.txt /

And then list the files in HDFS:

%HADOOP_HOME%\bin\hdfs dfs -ls /

You should get some result similiar to the following screenshot:

image

Start YARN daemons

Start YARN through the following command:

%HADOOP_HOME%\sbin\start-yarn.cmd

Similar to HDFS, two windows will open:

image

To verify, we can run the following sample job to count word count:

%HADOOP_HOME%\bin\yarn jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.0.0.jar wordcount /test.txt /out

image

Web UIs

Resource manager

You can also view your job status through YRAN website. The default path is http://localhost:8088

imageimage

NameNode UI

Default URL: http://localhost:9870

image

image

DataNode UI

Through name node, you can find out all the data nodes. For my case, i only have single data node with UI URL as http://localhost:9864 

image

Errors and fixes

java.io.FileNotFoundException: Could not locate Hadoop executable: … \hadoop-3.0.0\bin\winutils.exe

Refer to the following page to fix the problem:

https://wiki.apache.org/hadoop/WindowsProblems

java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)

This error is the same as the above one.

Refer to ‘Windows binaries for Hadoop versions (built from the git commit ID used for the ASF relase) ‘

https://github.com/steveloughran/winutils 

For this example, I am using Hadoop 3.0.0.

https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin

To fix it, copy over the above directory to %HADOOP_HOME%/bin.

Related pages

Resolve Hadoop RemoteException - Name node is in safe mode

16 views   0 comments last modified about 14 days ago

In Safe Mode, the HDFS cluster is read-only. After completion of block replication maintenance activity, the name node leaves safe mode automatically. If you try to delete files in safe mode, the following exception may raise: org.apache.hadoop.ipc.RemoteException(org.apac...

View detail

Configure Sqoop in a Edge Node of Hadoop Cluster

71 views   0 comments last modified about 14 days ago

This page continues with the following documentation about configuring a Hadoop multi-nodes cluster via adding a new edge node to configure administration or client tools. ...

View detail

Configure YARN and MapReduce Resources in Hadoop Cluster

12 views   0 comments last modified about 14 days ago

When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly. If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully. For example...

View detail

Configure Hadoop 3.1.0 in a Multi Node Cluster

315 views   0 comments last modified about 15 days ago

Previously, I summarized the steps to install Hadoop in a single node Windows machine. Install Hadoop 3.0.0 in Windows (Single Node) In this page, I...

View detail

Install Big Data Tools (Spark, Zeppelin, Hadoop) in Windows for Learning and Practice

158 views   2 comments last modified about 21 days ago

Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...

View detail

Default Ports Used by Hadoop Services (HDFS, MapReduce, YARN)

45 views   0 comments last modified about 28 days ago

This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...

View detail

Add comment

Please login first to add comments.  Log in New user?  Register

Comments (7)

RT Re: Install Hadoop 3.0.0 in Windows (Single Node)

Raym*** about 3 months ago

@Neil S

No need to apology. I don't have one VM with Windows 7 so I could not verify for you.

I was about to suggest you to create one issue in the GitHub and then I just noticed you already did that.

https://github.com/steveloughran/winutils/issues/9 

Let's see how the author would response. 

The WinUtils project source code is available here:

https://github.com/apache/hadoop/tree/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-common-project/hadoop-common/src/main/winutils 

You may want to debug it in your system (if you have Visual Studio installed). More specifically, the source code for the command you are invoking is available here:

https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-common-project/hadoop-common/src/main/winutils/systeminfo.c

If I am using Windows 7, I could actually help you to debug.

Ne*** about 3 months ago

@Raymond Tang

I have to apologize, I missed the point that you mentioned Windows 10 as the work environment at the begining of the tutorial, so the tutorial is not applied to Windows 7 64 bit.

I'm really sorry about my mistake.

NS Re: Install Hadoop 3.0.0 in Windows (Single Node)

Ne*** about 3 months ago

@Raymond Tang

I have to apologize, I missed the point that you mentioned Windows 10 as the work environment at the begining of the tutorial, so the tutorial is not applied to Windows 7 64 bit.

I'm really sorry about my mistake.

Ne*** about 3 months ago

@Raymond Tang

If you ran the command on Windows 10 and you didn't find any problems, then it doesn't contradict with what I wrote and the tutorial is relevant for Windows 10, but if the command "winutils systeminfo" returns the error when running on Windows7 64 bit then it may be beneficial noticing some requirenments in the tutorial that Hadoop 3.0.0 can only be installed on Windows 10 at the moment untill the issue with winutils.exe is not solved for Windows7 64 bit, due to the reason that winutils.exe for Hadoop 3.0.0 still doesn't on Windows7 64 bit.

NS Re: Install Hadoop 3.0.0 in Windows (Single Node)

Ne*** about 3 months ago

@Raymond Tang

If you ran the command on Windows 10 and you didn't find any problems, then it doesn't contradict with what I wrote and the tutorial is relevant for Windows 10, but if the command "winutils systeminfo" returns the error when running on Windows7 64 bit then it may be beneficial noticing some requirenments in the tutorial that Hadoop 3.0.0 can only be installed on Windows 10 at the moment untill the issue with winutils.exe is not solved for Windows7 64 bit, due to the reason that winutils.exe for Hadoop 3.0.0 still doesn't on Windows7 64 bit.

Raym*** about 3 months ago

@Neil S

I have run the command in my system. I didn't get any issue.


RT Re: Install Hadoop 3.0.0 in Windows (Single Node)

Raym*** about 3 months ago

@Neil S

I have run the command in my system. I didn't get any issue.


Ne*** about 3 months ago

@Raymond Tang

I followed all the steps to setup. The problem is played back on Windows7 64bit. If you try issuing a command "winutils systeminfo" in the directory where winutils.exe for Hadoop 3.0.0 is located you will receive an error responce. This does not happen if you do the same with winutils for hadoop 2.6.4 or 2.7.x for instance but unfortunately this version of the executable file is not suitable for Hadoop 3.0.

NS Re: Install Hadoop 3.0.0 in Windows (Single Node)

Ne*** about 3 months ago

@Raymond Tang

I followed all the steps to setup. The problem is played back on Windows7 64bit. If you try issuing a command "winutils systeminfo" in the directory where winutils.exe for Hadoop 3.0.0 is located you will receive an error responce. This does not happen if you do the same with winutils for hadoop 2.6.4 or 2.7.x for instance but unfortunately this version of the executable file is not suitable for Hadoop 3.0.

Raym*** about 3 months ago

@Neil S.

The machine I am using is Windows 10 64 bit and it is working properly all the time.

Did you follow all the steps to setup?

RT Re: Install Hadoop 3.0.0 in Windows (Single Node)

Raym*** about 3 months ago

@Neil S.

The machine I am using is Windows 10 64 bit and it is working properly all the time.

Did you follow all the steps to setup?

Ne*** about 3 months ago

Unfortunately the issue with the missing winutils.exe in the binary distributive of Hadoop 3.0.0 cannot be solved. The file windows.exe downloaded from the above mentioned mentioned resource causes an error when run with the parameter systeminfo on Windows7 64 bit. The text of the error is PdhAddCounter \Network Interface(*)\Bytes Received/Sec failed with 0xc0000bb8.
Error in GetDiskAndNetwork. Err:1

NS Re: Install Hadoop 3.0.0 in Windows (Single Node)

Ne*** about 3 months ago

Unfortunately the issue with the missing winutils.exe in the binary distributive of Hadoop 3.0.0 cannot be solved. The file windows.exe downloaded from the above mentioned mentioned resource causes an error when run with the parameter systeminfo on Windows7 64 bit. The text of the error is PdhAddCounter \Network Interface(*)\Bytes Received/Sec failed with 0xc0000bb8.
Error in GetDiskAndNetwork. Err:1