Install Hadoop 3.0.0 on Windows (Single Node)
insights Stats
Articles about Apache Hadoop, Hive and HBase installation, performance tuning and general tutorials.
*The yellow elephant logo is a registered trademark of Apache Hadoop.
- Tools and Environment
- Download Binary Package
- UnZip binary package
- Setup environment variables
- Verify your setup
- HDFS configurations
- Edit file hadoop-env.cmd
- Edit file core-site.xml
- Edit file hdfs-site.xml
- Edit file workers
- YARN configurations
- Edit file mapred-site.xml
- Edit file yarn-site.xml
- Initialize environment variables
- Format file system
- Start HDFS daemons
- Start YARN daemons
- Web UIs
- Resource manager
- NameNode UI
- DataNode UI
- Errors and fixes
- java.io.FileNotFoundException: Could not locate Hadoop executable: … \hadoop-3.0.0\bin\winutils.exe
- java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)
This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page:
Install Latest Hadoop 3.2.1 on Windows 10 Step by Step Guide
Tools and Environment
- GIT Bash
- Command Prompt
- Windows 10
Download Binary Package
Download the latest binary from the following site:
In my case, I am saving the file to folder: F:\DataA nalytics
UnZip binary package
Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:
$ cd F:\DataAnalytics
fahao@Raymond-Alienware MINGW64 /f/DataAnalytics
$ tar -xvzf hadoop-3.0.0.tar.gz
In my case, the Hadoop binary is extracted to: F:\DataAnalytics\hadoop-3.0.0
Setup environment variables
Make sure the following environment variables are set correctly:
- JAVA_HOME: pointing to your Java JDK installation folder.
- HADOOP_HOME: pointing to your Hadoop folder in the previous step.
Then add ‘%JAVA_HOME%/bin’ and ‘%HADOOP_HOME%/bin’ into Path environment variable like the following screenshot:
Verify your setup
You should be able to verify your settings via the following command:
F:\DataAnalytics\hadoop-3.0.0>hadoop -version java version "1.8.0_161" Java(TM) SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
HDFS configurations
Edit file hadoop-env.cmd
Change this file in %HADOOP_HOME%/etc/hadoop directory to add the following lines at the end of file:
set HADOOP_PREFIX=%HADOOP_HOME% set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop set YARN_CONF_DIR=%HADOOP_CONF_DIR% set PATH=%PATH%;%HADOOP_PREFIX%\bin
Edit file core-site.xml
Make sure the following configurations are existing:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:19000</value>
</property> </configuration>
By default, the above property configuration doesn’t exist.
Edit file hdfs-site.xml
Make sure the following configurations are existing (you can change the file path to your own paths):
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///F:/DataAnalytics/dfs/namespace_logs</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///F:/DataAnalytics/dfs/data</value>
</property> </configuration>
The above configurations setup the HFDS locations for storing namespace, logs and data files.
Edit file workers
Ensure the following content is existing:
localhost
YARN configurations
Edit file mapred-site.xml
Edit mapred-site.xml under %HADOOP_HOME%\etc\hadoop and add the following configuration, replacing %USERNAME% with your Windows user name.
<configuration>
<property>
<name>mapreduce.job.user.name</name>
<value>%USERNAME%</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.apps.stagingDir</name>
<value>/user/%USERNAME%/staging</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>local</value>
</property>
</configuration>
Edit file yarn-site.xml
Make sure the following entries are existing:
<configuration>
<property>
<name>yarn.server.resourcemanager.address</name>
<value>0.0.0.0:8020</value>
</property>
<property>
<name>yarn.server.resourcemanager.application.expiry.interval</name>
<value>60000</value>
</property>
<property>
<name>yarn.server.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.server.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/dep/logs/userlogs</value>
</property>
<property>
<name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
</property> </configuration>
Initialize environment variables
Run hadoop-env.cmd to setup environment variables. For my case, the file path is:
%HADOOP_HOME%\etc\hadoop\hadoop-env.cmd
Format file system
Run the following command to format the file system:
hadoop namenode -format
The command should print out some logs like the following (the highlighted path may vary base on your HDFS configurations):
2018-02-18 21:29:41,501 INFO namenode.FSImage: Allocated new BlockPoolId: BP-353327356-172.24.144.1-1518949781495
2018-02-18 21:29:41,817 INFO common.Storage: Storage directory F:\DataAnalytics\dfs\namespace_logs has been successfully formatted.
2018-02-18 21:29:41,826 INFO namenode.FSImageFormatProtobuf: Saving image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 using no compression
2018-02-18 21:29:41,934 INFO namenode.FSImageFormatProtobuf: Image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 of size 390 bytes saved in 0 seconds.
2018-02-18 21:29:41,969 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
Start HDFS daemons
Run the following command to start the NameNode and DataNode on localhost.
%HADOOP_HOME%\sbin\start-dfs.cmd
The above command line will open two Command Prompt Windows: one for namenode and another for datanode.
To verify, let’s copy a file to HDFS:
%HADOOP_HOME%\bin\hdfs dfs -put file:///F:/DataAnalytics/test.txt /
And then list the files in HDFS:
%HADOOP_HOME%\bin\hdfs dfs -ls /
You should get some result similiar to the following screenshot:
Start YARN daemons
Start YARN through the following command:
%HADOOP_HOME%\sbin\start-yarn.cmd
Similar to HDFS, two windows will open:
To verify, we can run the following sample job to count word count:
%HADOOP_HOME%\bin\yarn jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.0.0.jar wordcount /test.txt /out
Web UIs
Resource manager
You can also view your job status through YRAN website. The default path is http://localhost:8088
NameNode UI
Default URL: http://localhost:9870
DataNode UI
Through name node, you can find out all the data nodes. For my case, i only have single data node with UI URL as http://localhost:9864
Errors and fixes
java.io.FileNotFoundException: Could not locate Hadoop executable: … \hadoop-3.0.0\bin\winutils.exe
Refer to the following page to fix the problem:
https://wiki.apache.org/hadoop/WindowsProblems
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)
This error is the same as the above one.
Refer to ‘Windows binaries for Hadoop versions (built from the git commit ID used for the ASF relase) ‘
https://github.com/steveloughran/winutils
For this example, I am using Hadoop 3.0.0.
https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin
To fix it, copy over the above directory to %HADOOP_HOME%/bin.
person Raymond access_time 5 years ago
Hi Chong,
I assume you followed all the exact steps?
And also is DFS services running successfully? You should be able to see two Command Prompt windows (one for name node and another for data node) if it is successful. The two windows will print live logs.
Use the following command, you should also able to see the listening port on 19000.
netstat -an | grep 19000 TCP 0.0.0.0:19000 0.0.0.0:0 LISTENING TCP 192.168.56.1:1427 192.168.56.1:19000 ESTABLISHED TCP 192.168.56.1:19000 192.168.56.1:1427 ESTABLISHED
You can also view the portal if it is started: http://localhost:9870.
If not that means the services are not started successfully.
I don't have my 3.0.0 environment in my system so hard to diagnose. However I do have the latest 3.2.1 environment and it is working properly:
Install Latest Hadoop 3.2.1 on Windows 10 Step by Step Guide
Maybe you can try this one and it will give better chance for me to find out any issue as I would have same environment as you.
Hi Chong,
I assume you followed all the exact steps?
And also is DFS services running successfully? You should be able to see two Command Prompt windows (one for name node and another for data node) if it is successful. The two windows will print live logs.
Use the following command, you should also able to see the listening port on 19000.
netstat -an | grep 19000 TCP 0.0.0.0:19000 0.0.0.0:0 LISTENING TCP 192.168.56.1:1427 192.168.56.1:19000 ESTABLISHED TCP 192.168.56.1:19000 192.168.56.1:1427 ESTABLISHED
You can also view the portal if it is started: http://localhost:9870.
If not that means the services are not started successfully.
I don't have my 3.0.0 environment in my system so hard to diagnose. However I do have the latest 3.2.1 environment and it is working properly:
Install Latest Hadoop 3.2.1 on Windows 10 Step by Step Guide
Maybe you can try this one and it will give better chance for me to find out any issue as I would have same environment as you.
person Chong access_time 5 years ago
Hi Raymond,
My core-site.xml has changed to
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:19000</value> </property> </configuration>
but the error still persist.
Please let me know if you need more information.
Hi Raymond,
My core-site.xml has changed to
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:19000</value> </property> </configuration>
but the error still persist.
Please let me know if you need more information.
person Raymond access_time 5 years ago
Hi Chong,
Can you show me your core-site.xml configuration file?
My version works with the following settings however fs.default.name was deprecated in 3.0.0 and we'd better use fs.defaultFS.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>
</property> </configuration>
Can you try change it to localhost?
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:19000</value> </property> </configuration>
Yes, I did start the hdfs daemon by using the following command.
start-dfs.cmd
Comment is deleted or blocked.
Hi Chong,
Can you show me your core-site.xml configuration file?
My version works with the following settings however fs.default.name was deprecated in 3.0.0 and we'd better use fs.defaultFS.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>
</property> </configuration>
Can you try change it to localhost?
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:19000</value> </property> </configuration>
person Chong access_time 5 years ago
Hi Raymond,
Thanks for the blog! It is very useful!
I have an issue at the copy file to HDFS Issue. It stated that "your endpoint configuration is wrong".
May I know is there anyway that I can resolve this issue?
Hi Raymond,
Thanks for the blog! It is very useful!
I have an issue at the copy file to HDFS Issue. It stated that "your endpoint configuration is wrong".
May I know is there anyway that I can resolve this issue?
Hi Raymond,
Thanks for the blog! It is very useful!
I have an issue at the copy file to HDFS Issue. It stated that "your endpoint configuration is wrong".
May I know is there anyway that I can resolve this issue?
Such a very useful post. Very interesting to read this blog.I would like to thank you for the efforts you had made for writing this awesome blog.
Apologies for the late reply. Have you got your problem resolved?
person U-7e9qo64lwkem90f8 access_time 6 years ago
There are many ways to do it:
- hadoop fs commands to copy file from local to HDFS
- Spark or any other frames that can talk with HDFS...
- Sqoop (SQL to Hadoop)
person Swati Agarwal access_time 6 years ago
Hi there,
I have installed the Hadoop 3 as per instructions mentioned above. Please suggest the steps to load data in Hadoop through cmd in windows 10 and also to perform operation on it.
Regards,
Swati
Hi Raymond,
Yes. the DFS running successfully.
Alright! I will proceed to go for hadoop version 3.2.1 and will notify you in that post if I have any error.
Thanks a lot for your help!