Build Latest Hadoop on Windows 10 natively via Docker
In my previous post Compile and Build Hadoop 3.2.1 on Windows 10 Guide, I documented the steps to build Hadoop natively on a Windows 10 machine. The steps are quite completed and it can easily go wrong. This article summarizes the steps to build Hadoop on Windows 10 via Docker Desktop for Windows. Once the image is built, you can then use it to build different versions of Hadoop (as long as the prerequisites don't change). This article build the Hadoop trunk as at 11 Dec 2022.
Prerequisites
- Windows 10 Pro (my version is 10.0.19045 N/A Build 19045)
- Docker Desktop for Windows. I use dockerd and Docker CLI directly. The version I am using is v4.15.0.
- Git Bash exists in your Windows.
- At least 50GB of free storage (as we will use Windows base image and also install VS and many other tools).
1. Ensure Windows containers
After you install Docker Desktop for Windows, make sure you switch to Windows container. Follow article How to Change Docker Data Root Path on Windows 10 if you don't know how to do that.
The main steps are:
- Start
dockerd.exe
process if not started automatically via Services. - Run switch command to switch:
C:\Program Files\Docker\Docker\DockerCli.exe -SwitchDaemon
- Verify the results:
docker version
2. Clone Hadoop source code
Check out Hadoop source code (trunk branch) from GitHub via the following command in your Windows 10 machine (host machine):
cd C:\ git clone -c core.longpaths=true https://github.com/apache/hadoop.git
The above command clone Hadoop source code to C:/hadoop. If you use a different path, remember to change it accordingly.
3. - Run docker build command
Frist, change directory to C:/hadoop.
cd C:/hadoop
Run the following command to start build:
docker build -t hadoop-build-windows-10 -f .\dev-support\docker\Dockerfile_windows_10 .\dev-support\docker\
Wait until the build finishes. It can take hours.
4. Verify the built image
Once the build is completed, you can use the following command to verify:
docker image ls
You should be able find something like the following in the output:
docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE hadoop-build-windows-10 latest d54c13837078 2 hours ago 28.1GB
As you can see the image size is big.
5. Build Hadoop
With the image built successfully, we can now start building Hadoop using the image.
5.1 Run a container using the image
We start a Docker container by using the following command:
docker run --rm -it hadoop-build-windows-10
The output looks like the following screenshot:
If you want to use your Windows host machine's Maven repo local cache, you can start the container with the following command:
docker run --rm -v D:\Packages\mvn-repo:C:\Users\ContainerAdministrator\.m2\repository -it hadoop-build-windows-10
Note - D:\Packages\mvn-repo is the path of my local Maven repo. Please change it accordingly.
This can save time to download packages from Internet each time when you run the build.
5.2 Checkout source code
You can download the source code in the container command prompt (the entry is the Command Prompt):
git clone -c core.longpaths=true https://github.com/apache/hadoop.gi