Build Latest Hadoop on Windows 10 natively via Docker

visibility 139 event 2022-12-11 access_time 2 months ago language English
more_vert
Build Latest Hadoop on Windows 10 natively via Docker

In my previous post Compile and Build Hadoop 3.2.1 on Windows 10 Guide, I documented the steps to build Hadoop natively on a Windows 10 machine. The steps are quite completed and it can easily go wrong. This article summarizes the steps to build Hadoop on Windows 10 via Docker Desktop for Windows. Once the image is built, you can then use it to build different versions of Hadoop (as long as the prerequisites don't change). This article build the Hadoop trunk as at 11 Dec 2022. 

Prerequisites

  1. Windows 10 Pro (my version is 10.0.19045 N/A Build 19045)
  2. Docker Desktop for Windows. I use dockerd and Docker CLI directly. The version I am using is v4.15.0.
  3. Git Bash exists in your Windows.
  4. At least 50GB of free storage (as we will use Windows base image and also install VS and many other tools).
warning Alert - The whole build process can take hours.

1. Ensure Windows containers

After you install Docker Desktop for Windows, make sure you switch to Windows container. Follow article How to Change Docker Data Root Path on Windows 10 if you don't know how to do that.

The main steps are:

  • Start dockerd.exe process if not started automatically via Services.
  • Run switch command to switch:
    C:\Program Files\Docker\Docker\DockerCli.exe -SwitchDaemon
  • Verify the results: docker version

2. Clone Hadoop source code

Check out Hadoop source code (trunk branch) from GitHub via the following command in your Windows 10 machine (host machine):

cd C:\
git clone -c core.longpaths=true https://github.com/apache/hadoop.git

The above command clone Hadoop source code to C:/hadoop. If you use a different path, remember to change it accordingly. 

3. - Run docker build command

Frist, change directory to C:/hadoop.

cd C:/hadoop

Run the following command to start build:

docker build -t hadoop-build-windows-10 -f .\dev-support\docker\Dockerfile_windows_10 .\dev-support\docker\

Wait until the build finishes. It can take hours.

4. Verify the built image

Once the build is completed, you can use the following command to verify:

docker image ls

You should be able find something like the following in the output:

docker image ls
REPOSITORY                  TAG        IMAGE ID       CREATED        SIZE
hadoop-build-windows-10     latest     d54c13837078   2 hours ago    28.1GB

As you can see the image size is big.

5. Build Hadoop

With the image built successfully, we can now start building Hadoop using the image.

5.1 Run a container using the image

We start a Docker container by using the following command:

docker run --rm -it hadoop-build-windows-10

The output looks like the following screenshot:

20221206111746-image.png

If you want to use your Windows host machine's Maven repo local cache, you can start the container with the following command:

docker run --rm -v D:\Packages\mvn-repo:C:\Users\ContainerAdministrator\.m2\repository -it hadoop-build-windows-10

Note - D:\Packages\mvn-repo is the path of my local Maven repo. Please change it accordingly.

This can save time to download packages from Internet each time when you run the build.

5.2 Checkout source code

You can download the source code in the container command prompt (the entry is the Command Prompt):

git clone -c core.longpaths=true https://github.com/apache/hadoop.gi

5.3 'Fix' Maven blocked http repo issue

This article is semi-public.

Please log in or register to read the full content.

account_circle Log in person_add Register

Log in with external accounts

info Last modified by Raymond 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts