Compile and Build Hadoop 3.2.1 on Windows 10 Guide

Raymond Raymond event 2020-01-19 visibility 16,018 comment 13
more_vert
Compile and Build Hadoop 3.2.1 on Windows 10 Guide

This article provides detailed steps about how to compile and build Hadoop (incl. native libs) on Windows 10. The following guide is based on Hadoop release 3.2.1.

*The yellow elephant logo is a registered trademark of Apache Hadoop; the blue window logo is registered trademark of Microsoft.

Prerequisites

In the repository of Hadoop on GitHub, BUILDING.txt file provides the high level steps to build Hadoop on different environments. It also contains the prerequisites for each release. For different releases, the dependencies/prerequisites may be different. For example, Hadoop 3.2.2 requires ProtocalBuffer 3.0 while Hadoop 3.2.1 requires version 2.5.0. 

For this guide, it targets Hadoop 3.2.1 build. Source code is based on branch rel/release-3.2.1. The building file is located hereTo summarise, the following are the requirements for build on Windows 10 (I will demonstrate how to install each of them in the following sections):

RequirementComments
Windows 10

I'm using Windows 10 x64 virtual machine for this guide. It's brand new machine and there is no programs installed on the machine. 

systeminfo

OS Version: 10.0.10240 N/A Build 10240

System Type: x64-based PC

Internet connection

To fetch Maven/Hadoop dependencies from maven central or other repos.

Required to download dependent software and source code.

JDK 1.8Hadoop is built on top of Java. We need JDK 1.8 to build and run Hadoop.
Maven 3.0 or laterRequired for build Hadoop.
ProtocolBuffer 2.5.0
Some Java classes will be generated based on ProtocolBuffer definitions. 
CMake 3.1 or newer
For compile native C libs for HDFS, etc.
Python (optional)
For generation of docs using 'mvn site'. For this guide, we may ignore it.
zlib headers (optional)
If building native code bindings for zlib. For this guide, we may ignore it.
GnuWin32 or Git Bash

Unix command-line tools from GnuWin32 or Git Bash: sh, mkdir, rm, cp, tar, gzip. These tools must be present on your PATH.

For easiness, we will use Git Bash as we will use it to checkout source code too.

Windows SDK 8.1

Required if building CPU rate control for the container executor.

*After I complete the whole build, I don't feel this is mandatory but will leave it here for now until I prove it to be 100% true.

Visual Studio 2010 Professional or Higher

Includes C++ compilers, etc. 

*Higher version of Visual Studio should also be okay. However we just follow the official instructions just in case. 

Now let's begin to install these dependencies one by one.

warning Alert - In the following steps, PowerShell commands are used to add environment variables. If you already have PATH user environment variable setup, I'd suggest you manually update PATH environment variable via the GUI tool. 
infoYou can skip some of the following steps if you already have those tools setup.

Section I Installation Guide

Step by step installation guide.

Step 1 Install JDK 1.8

1) Download JDK 1.8 from the following page:

Download JDK 1.8

2) After download, please run the executable to install JDK.

In my system, JDK is installed at this location: C:\Program Files\Java\jdk1.8.0_241.

3) Run the following commands in PowerShell to setup JAVA_HOME environment variable (remember to replace JDK path to your own path):

SETX JAVA_HOME "C:\Program Files\Java\jdk1.8.0_241"
$NEWPATH="%JAVA_HOME%\bin;$env:PATH"
SETX PATH $NEWPATH

4) And then you can verify by running the following command in Command Prompt or PowerShell:

java -version

Your output should look like the following:

2020011960546-image.png

Step 2 Install Git Bash 

1) Download Windows Git Bash from the following website:

Git - Downloads

I am using 2.25.0 64-bit version of Git for Windows.

2) After download, click Run button to install Git. I'm installing Git to folder C:\Program Files\Git.

3) Run the following commands in PowerShell to setup environment variables (remember to change your Git installation location accordingly):

SETX GIT_HOME "C:\Program Files\Git"
$NEWPATH="%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%JAVA_HOME%\bin;$env:PATH"
SETX PATH $NEWPATH

The output looks like the following:

2020011962922-image.png

4) Open a new Command Prompt or PowerShell window to verify the Linux command lines are available: sh, mkdir, rm, cp, tar, gzip.

sh -version
mkdir 
rm --help
tar --help
gzip --help
cp --help
bash

Make sure all the above commands runs successfully without error.

Step 3 Setup Maven

1) Download Maven from the following website (choose Binary tar.gz archive from the Files section on the web page) :

Downloading Apache Maven 3.6.3

2) Run the following commands in PowerShell to unzip the file to a installation folder (for my system, the folder is C:\maven\):

mkdir C:\maven
tar -xvf /c/Users/Raymond/Downloads/apache-maven-3.6.3-bin.tar.gz -C /c/maven

Please remember to replace the highlighted path with your own binary path accordingly. Since tar is a Linux command and we need to convert the path accordingly.

For example:

Windows PathLinux Path
C:\Users\Raymond\Downloads\apache-maven-3.6.3-bin.tar.gz/c/Users/Raymond/Downloads/apache-maven-3.6.3-bin.tar.gz
C:\maven/c/maven

The output looks like the following:

2020011964957-image.png

After this Maven is installed in folder in my system: C:\maven\apache-maven-3.6.3.

3) Run the following command in PowerShell to setup MAVEN_HOME variable (please change the path accordingly):

SETX MAVEN_HOME C:\maven\apache-maven-3.6.3

Add the following path to your user environment variable PATH:

%MAVEN_HOME%\bin;

2020011965734-image.png

Till now, you should have these additional paths added to PATH variable:

%MAVEN_HOME%\bin;%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%JAVA_HOME%\bin;

4) Verify maven installation by running the following command in a new PowerShell window:

mvn --version

The output looks like the following screenshot:

2020011965926-image.png

5) You can further configure maven local repository cache folder by editing settings.xml file in folder %MAVEN_HOME%\conf. By default, the packages will be downloaded to folder: ${user.home}/.m2/repository

For this guide, I'll just use the default folder. In my system, the path is C:\Users\Raymond\.m2\repository.

To avoid Maven build issues like 'The command line is too long.', let's create a symbolic link to point to this folder (remember to change to your own path accordingly): 

mklink /J C:\mrepo C:\Users\Raymond\.m2\repository

And then update settings.xml file to configure local repository path:

<localRepository>C:\mrepo</localRepository>

Step 4 Install Protocol Buffer

1) Download Protocol Buffer 2.5.0 from the following location:

protoc-2.5.0-win32.zip

2) Unzip the downloaded file to C:\protoc folder using the following commands in PowerShell:

mkdir C:\protoc
unzip /c/Users/Raymond/Downloads/protoc-2.5.0-win32.zip -d /c/protoc

Similar to the previous steps, please replace the paths accordingly to suit your environment.

There are two files added to C:\protoc folder as the following screenshot shows:

2020011971252-image.png

3) We also need to add protoc path (C:\protoc) to environment variable PATH:

2020011971420-image.png

4) Verify the environment variable by opening a new PowerShell or Command Prompt window and run the following command:

protoc --version

The output looks like the following:

2020011971610-image.png

Step 5 Install CMake

1) Download CMake from this link: cmake-3.16.2-win64-x64.msi.

2) Double click the downloaded file cmake-3.16.2-win64-x64.msi to install this application.  

Make sure you choose either Add CMake to the system PATH for all users or Add CMake to the system PATH for the current user

2020011972455-image.png

Alternatively, you can manually add it to PATH environment variable:

C:\Program Files\CMake\bin;C:\protoc;%MAVEN_HOME%\bin;%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%JAVA_HOME%\bin;%PATH%

3) Once you finish the installation, verify you can run cmake command in Command Prompt or PowerShell:

cmake --version

The output looks like the following screenshot:

2020011973013-image.png

Step 6 Install Python

1) Download and install python from this web page: https://www.python.org/downloads/.

2020011973501-image.png

2) Verify installation by running the following command in Command Prompt or PowerShell:

python --version

The output looks like the following:

2020011973915-image.png

If python command cannot be directly invoked, please check PATH environment variable to make sure Python installation path is added:

For example, in my environment Python is installed at the following location:

2020011974035-image.png

Thus path C:\Users\Raymond\AppData\Local\Programs\Python\Python38-32 is added to PATH variable.

2020011973809-image.png

Step 7 Install Windows SDK 8.1.0

1) Download Windows SDK 8.1.0 from this web page: http://msdn.microsoft.com/en-us/windows/bg162891.aspx.

2) Click sdksetup.exe to start the installation. Follow the wizard to complete installation. 

2020011974407-image.png

Step 8 Install Visual Studio 2010 Professional

1) Download VS2010 from the following page (login is required):

https://my.visualstudio.com/Downloads?q=visual%20studio%202010&wt.mc_id=o~msft~vscom~older-downloads

2) Install VS2010 using the setup wizard. To save space, I only choose Visual C++ as we only need X64 Compilers and Tools to compile C/C++ Windows native libs

2020011981831-image.png

3) After the installation is completed, there are a few more Command Prompt available to use.  We will use x64 command prompt in some of the following steps.

2020011983540-image.png

Step 9 Setup a few more environment variables

1) We need to setup two more variables based on the following logic:

@REM *************************************************
@REM JDK and these settings MUST MATCH
@REM
@REM 64-bit : Platform = x64, VCVARSPLAT = amd64
@REM
@REM 32-bit : Platform = Win32, VCVARSPLAT = x86
@REM

For my environment, it is 64 bit. Thus I need to run the following command in PowerShell or Command Prompt:

SETX Platform x64
SETX VCVARSPLAT amd64

2020011984240-image.png

The above screenshot is the output in my system.

2) Setup Visual Studio environment variable by running the following command in PowerShell or Command Prompt:

SETX MSVS "C:\Program Files (x86)\Microsoft Visual Studio 10.0"

* C:\Program Files (x86)\Microsoft Visual Studio 10.0 is the path of my Visual Studio. Please change it accordingly based on your environment setup. 

2020011984515-image.png

3) Verify the setup by running the following command in Command Prompt:

CALL "%MSVS%\VC\vcvarsall.bat" %VCVARSPLAT%

The output looks like the following:

]2020011984732-image.png

Step 10 Download source code 

1) Create a new folder C:\hdp using the following command:

mkdir C:\hdp

2020011985030-image.png

2) Open Git Bash window (using Run as Administrator mode) and change directory to C:\hdp.

2020011985139-image.png

3) Run the following command in Git Bash terminal to change git settings to allow long paths:

git config --system core.longpaths true

2020011985335-image.png

4) Clone the repository by running the following command:

git clone https://github.com/apache/hadoop.git

It may take a few minutes as this repo is huge. Wait until the clone is completed successfully.

2020011985920-image.png

5) Check out branch rel/release-3.2.1 as we will use source code in this branch to build:

cd hadoop/
git checkout rel/release-3.2.1

6) Fix some C code for the HDFS native library.

As we are using Visual C++ 2010 to compile the C code, there is one source code file we need to manually update it otherwise it won't compile as VS 2010 does not support either declaration in for loop headers or mixed declarations and statements for C programs. Refer to Section II for more details about this issue.

Open file src\main\native\libhdfs-tests\test_libhdfs_threaded.c in a text editor and manually replace function static int doTestHdfsOperations(struct tlhThreadInfo *ti, hdfsFS fs,const struct tlhPaths *paths) with the following content:

static int doTestHdfsOperations(struct tlhThreadInfo *ti, hdfsFS fs,
                                const struct tlhPaths *paths)
{
    char tmp[4096];
    hdfsFile file;
    int ret, expected, numEntries;
    hdfsFileInfo *fileInfo;
    struct hdfsReadStatistics *readStats = NULL;
    struct hdfsHedgedReadMetrics *hedgedMetrics = NULL;
	char invalid_path[] = "/some_invalid/path";
	hdfsFileInfo * dirList;
	char listDirTest[PATH_MAX];
	int nFile;
	char filename[PATH_MAX];
	hdfsFS fs2 = NULL;

    if (hdfsExists(fs, paths->prefix) == 0) {
        EXPECT_ZERO(hdfsDelete(fs, paths->prefix, 1));
    }
    EXPECT_ZERO(hdfsCreateDirectory(fs, paths->prefix));

    EXPECT_ZERO(doTestGetDefaultBlockSize(fs, paths->prefix));

    /* There is no such directory.
     * Check that errno is set to ENOENT
     */
    EXPECT_NULL_WITH_ERRNO(hdfsListDirectory(fs, invalid_path, &numEntries), ENOENT);

    /* There should be no entry in the directory. */
    errno = EACCES; // see if errno is set to 0 on success
    EXPECT_NULL_WITH_ERRNO(hdfsListDirectory(fs, paths->prefix, &numEntries), 0);
    if (numEntries != 0) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on empty directory.", numEntries);
        return EIO;
    }

    /* There should not be any file to open for reading. */
    EXPECT_NULL(hdfsOpenFile(fs, paths->file1, O_RDONLY, 0, 0, 0));

    /* Check if the exceptions are stored in the TLS */
    EXPECT_STR_CONTAINS(hdfsGetLastExceptionRootCause(),
                        "File does not exist");
    EXPECT_STR_CONTAINS(hdfsGetLastExceptionStackTrace(),
                        "java.io.FileNotFoundException");

    /* hdfsOpenFile should not accept mode = 3 */
    EXPECT_NULL(hdfsOpenFile(fs, paths->file1, 3, 0, 0, 0));

    file = hdfsOpenFile(fs, paths->file1, O_WRONLY, 0, 0, 0);
    EXPECT_NONNULL(file);

    /* TODO: implement writeFully and use it here */
    expected = (int)strlen(paths->prefix);
    ret = hdfsWrite(fs, file, paths->prefix, expected);
    if (ret < 0) {
        ret = errno;
        fprintf(stderr, "hdfsWrite failed and set errno %d\n", ret);
        return ret;
    }
    if (ret != expected) {
        fprintf(stderr, "hdfsWrite was supposed to write %d bytes, but "
                "it wrote %d\n", expected, ret);
        return EIO;
    }
    EXPECT_ZERO(hdfsFlush(fs, file));
    EXPECT_ZERO(hdfsHSync(fs, file));
    EXPECT_ZERO(hdfsCloseFile(fs, file));

    EXPECT_ZERO(doTestGetDefaultBlockSize(fs, paths->file1));

    /* There should be 1 entry in the directory. */
    dirList = hdfsListDirectory(fs, paths->prefix, &numEntries);
    EXPECT_NONNULL(dirList);
    if (numEntries != 1) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on directory containing 1 file.", numEntries);
    }
    hdfsFreeFileInfo(dirList, numEntries);

    /* Create many files for ListDirectory to page through */
    strcpy(listDirTest, paths->prefix);
    strcat(listDirTest, "/for_list_test/");
    EXPECT_ZERO(hdfsCreateDirectory(fs, listDirTest));
    
    for (nFile = 0; nFile < 10000; nFile++) {
      snprintf(filename, PATH_MAX, "%s/many_files_%d", listDirTest, nFile);
      file = hdfsOpenFile(fs, filename, O_WRONLY, 0, 0, 0);
      EXPECT_NONNULL(file);
      EXPECT_ZERO(hdfsCloseFile(fs, file));
    }
    dirList = hdfsListDirectory(fs, listDirTest, &numEntries);
    EXPECT_NONNULL(dirList);
    hdfsFreeFileInfo(dirList, numEntries);
    if (numEntries != 10000) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on directory containing 10000 files.", numEntries);
        return EIO;
    }

    /* Let's re-open the file for reading */
    file = hdfsOpenFile(fs, paths->file1, O_RDONLY, 0, 0, 0);
    EXPECT_NONNULL(file);

    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    errno = 0;
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalBytesRead);
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalLocalBytesRead);
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalShortCircuitBytesRead);
    hdfsFileFreeReadStatistics(readStats);

    /* Verify that we can retrieve the hedged read metrics */
    EXPECT_ZERO(hdfsGetHedgedReadMetrics(fs, &hedgedMetrics));
    errno = 0;
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOps);
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOpsWin);
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOpsInCurThread);
    hdfsFreeHedgedReadMetrics(hedgedMetrics);

    /* TODO: implement readFully and use it here */
    ret = hdfsRead(fs, file, tmp, sizeof(tmp));
    if (ret < 0) {
        ret = errno;
        fprintf(stderr, "hdfsRead failed and set errno %d\n", ret);
        return ret;
    }
    if (ret != expected) {
        fprintf(stderr, "hdfsRead was supposed to read %d bytes, but "
                "it read %d\n", ret, expected);
        return EIO;
    }
    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    errno = 0;
    EXPECT_UINT64_EQ((uint64_t)expected, readStats->totalBytesRead);
    hdfsFileFreeReadStatistics(readStats);
    EXPECT_ZERO(hdfsFileClearReadStatistics(file));
    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    EXPECT_UINT64_EQ((uint64_t)0, readStats->totalBytesRead);
    hdfsFileFreeReadStatistics(readStats);
    EXPECT_ZERO(memcmp(paths->prefix, tmp, expected));
    EXPECT_ZERO(hdfsCloseFile(fs, file));

    //Non-recursive delete fails
    EXPECT_NONZERO(hdfsDelete(fs, paths->prefix, 0));
    EXPECT_ZERO(hdfsCopy(fs, paths->file1, fs, paths->file2));

    EXPECT_ZERO(hdfsChown(fs, paths->file2, NULL, NULL));
    EXPECT_ZERO(hdfsChown(fs, paths->file2, NULL, "doop"));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("doop", fileInfo->mGroup));
    EXPECT_ZERO(hdfsFileIsEncrypted(fileInfo));
    hdfsFreeFileInfo(fileInfo, 1);

    EXPECT_ZERO(hdfsChown(fs, paths->file2, "ha", "doop2"));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("ha", fileInfo->mOwner));
    EXPECT_ZERO(strcmp("doop2", fileInfo->mGroup));
    hdfsFreeFileInfo(fileInfo, 1);

    EXPECT_ZERO(hdfsChown(fs, paths->file2, "ha2", NULL));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("ha2", fileInfo->mOwner));
    EXPECT_ZERO(strcmp("doop2", fileInfo->mGroup));
    hdfsFreeFileInfo(fileInfo, 1);

    snprintf(tmp, sizeof(tmp), "%s/nonexistent-file-name", paths->prefix);
    EXPECT_NEGATIVE_ONE_WITH_ERRNO(hdfsChown(fs, tmp, "ha3", NULL), ENOENT);

    //Test case: File does not exist
    EXPECT_NULL_WITH_ERRNO(hdfsGetPathInfo(fs, invalid_path), ENOENT);

    //Test case: No permission to access parent directory
    EXPECT_ZERO(hdfsChmod(fs, paths->prefix, 0));
    //reconnect as user "SomeGuy" and verify that we get permission errors
    
    EXPECT_ZERO(hdfsSingleNameNodeConnect(tlhCluster, &fs2, "SomeGuy"));
    EXPECT_NULL_WITH_ERRNO(hdfsGetPathInfo(fs2, paths->file2), EACCES);
    EXPECT_ZERO(hdfsDisconnect(fs2));
    return 0;
}

Step 11 Review before build

In the previous steps, we've installed all the required tools, setup necessary environment variables and also checkout the code for build.
The following environment variables are added:

  • GIT_HOME = C:\Program Files\Git
  • JAVA_HOME = C:\Program Files\Java\jdk1.8.0_241
  • MAVEN_HOME = C:\maven\apache-maven-3.6.3
  • MSVS = C:\Program Files (x86)\Microsoft Visual Studio 10.0
  • Platform = x64
  • VCVARSPLAT = amd64
  • PATH = C:\Users\Raymond\AppData\Local\Programs\Python\Python38-32;C:\protoc;%MAVEN_HOME%\bin;%GIT_HOME%\bin;%GIT_HOME%\usr\bin;%JAVA_HOME%\bin;%PATH%

Now let's start building.

Step 12 Use Maven to build Hadoop 3.2.1

1) Open Visual Studio x64 Win64 Command Prompt (2010)

2020011991620-image.png

2) Change the directory to Hadoop source code directory by running the following command in the windows opened in step 1):

cd C:/hdp/hadoop

3) Run the following Maven command to start build:

mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

The above command line will skip running tests and skip generating Java document to save time.

infoSyntax reference: 
Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar][-Dmaven.javadoc.skip=true]
For more information about Maven CLI, please refer to Maven CLI Options Reference
warning We cannot use option -Dmaven.test.skip=true otherwise build for hadoop-yarn-project will fail. Refer to YARN-4084 for more details.

The build may take long time as there are many dependent packages need to be downloaded and many projects need to built. The packages download is only required for the first time when you run the build as packages will be cached in local maven repository based on your configuration. In my system, the packages are downloaded to C:\Users\Raymond\.m2\repository.

Wait until the build completes.

4) When the build completes successfully, Maven will show the summary of all the projects. 

20200119120833-image.png

5) The artefacts are published in the destination folder: C:\hdp\hadoop\hadoop-dist\target\hadoop-3.2.1.

20200119121426-image.png

check Congratulations! You've now successfully build Hadoop 3.2.1 on Windows 10.

Section II Issue Fixes

This section documents some issues encountered when I was doing the build. 

Resolve issues

As part of the build process, I did encounter a few minor issues. The following sections document the resolutions for your reference.

The command line is too long

When I was running build initially with the following command:

mvn package -Pdist,native-win -DskipTests -Dmaven.test.skip=true -Dtar -Dmaven.javadoc.skip=true

The process failed at hadoop-common project build with error 'The command line is too long'.

To resolve this issue, I simply created a symbolic link for Maven local repository:

mklink /J C:\mrepo C:\Users\Raymond\.m2\repository

The details are already updated in Step 3. 

Apache Hadoop HDFS Native Client Build failure 

This project built failed due to some coding issues in source file: src\main\native\libhdfs-tests\test_libhdfs_threaded.c.

Detailed errors:

     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(163): error C2143: syntax error : missing ';' before 'type' [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(164): error C2065: 'invalid_path' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(210): error C2143: syntax error : missing ';' before 'type' [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(211): error C2065: 'dirList' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(216): error C2065: 'dirList' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(219): error C2143: syntax error : missing ';' before 'type' [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(220): error C2065: 'listDirTest' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(221): error C2065: 'listDirTest' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(222): error C2065: 'listDirTest' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(223): error C2143: syntax error : missing ';' before 'type' [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(224): error C2065: 'nFile' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(224): error C2065: 'nFile' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(224): error C2065: 'nFile' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(226): error C2065: 'listDirTest' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(226): error C2065: 'nFile' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(231): error C2065: 'dirList' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(231): error C2065: 'listDirTest' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(232): error C2065: 'dirList' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(233): error C2065: 'dirList' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(312): error C2065: 'invalid_path' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(317): error C2143: syntax error : missing ';' before 'type' [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(318): error C2065: 'fs2' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(319): error C2065: 'fs2' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]
     [exec]   ..\..\..\..\..\src\main\native\libhdfs-tests\test_libhdfs_threaded.c(320): error C2065: 'fs2' : undeclared identifier [C:\hdp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\native\main\native\libhdfs\test_libhdfs_threaded_hdfs_static.vcxproj]

This issue occurred because VS C++ 2010 compiler doesn't support declare variables in the middle of the function or in the loop for C code.

Thus, to fix it, I need to temporarily remove the variable definition section to the beginning of the function:

static int doTestHdfsOperations(struct tlhThreadInfo *ti, hdfsFS fs,
                                const struct tlhPaths *paths)
{
    char tmp[4096];
    hdfsFile file;
    int ret, expected, numEntries;
    hdfsFileInfo *fileInfo;
    struct hdfsReadStatistics *readStats = NULL;
    struct hdfsHedgedReadMetrics *hedgedMetrics = NULL;
	char invalid_path[] = "/some_invalid/path";
	hdfsFileInfo * dirList;
	char listDirTest[PATH_MAX];
	int nFile;
	char filename[PATH_MAX];
	hdfsFS fs2 = NULL;

    if (hdfsExists(fs, paths->prefix) == 0) {
        EXPECT_ZERO(hdfsDelete(fs, paths->prefix, 1));
    }
    EXPECT_ZERO(hdfsCreateDirectory(fs, paths->prefix));

    EXPECT_ZERO(doTestGetDefaultBlockSize(fs, paths->prefix));

    /* There is no such directory.
     * Check that errno is set to ENOENT
     */
    EXPECT_NULL_WITH_ERRNO(hdfsListDirectory(fs, invalid_path, &numEntries), ENOENT);

    /* There should be no entry in the directory. */
    errno = EACCES; // see if errno is set to 0 on success
    EXPECT_NULL_WITH_ERRNO(hdfsListDirectory(fs, paths->prefix, &numEntries), 0);
    if (numEntries != 0) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on empty directory.", numEntries);
        return EIO;
    }

    /* There should not be any file to open for reading. */
    EXPECT_NULL(hdfsOpenFile(fs, paths->file1, O_RDONLY, 0, 0, 0));

    /* Check if the exceptions are stored in the TLS */
    EXPECT_STR_CONTAINS(hdfsGetLastExceptionRootCause(),
                        "File does not exist");
    EXPECT_STR_CONTAINS(hdfsGetLastExceptionStackTrace(),
                        "java.io.FileNotFoundException");

    /* hdfsOpenFile should not accept mode = 3 */
    EXPECT_NULL(hdfsOpenFile(fs, paths->file1, 3, 0, 0, 0));

    file = hdfsOpenFile(fs, paths->file1, O_WRONLY, 0, 0, 0);
    EXPECT_NONNULL(file);

    /* TODO: implement writeFully and use it here */
    expected = (int)strlen(paths->prefix);
    ret = hdfsWrite(fs, file, paths->prefix, expected);
    if (ret < 0) {
        ret = errno;
        fprintf(stderr, "hdfsWrite failed and set errno %d\n", ret);
        return ret;
    }
    if (ret != expected) {
        fprintf(stderr, "hdfsWrite was supposed to write %d bytes, but "
                "it wrote %d\n", expected, ret);
        return EIO;
    }
    EXPECT_ZERO(hdfsFlush(fs, file));
    EXPECT_ZERO(hdfsHSync(fs, file));
    EXPECT_ZERO(hdfsCloseFile(fs, file));

    EXPECT_ZERO(doTestGetDefaultBlockSize(fs, paths->file1));

    /* There should be 1 entry in the directory. */
    dirList = hdfsListDirectory(fs, paths->prefix, &numEntries);
    EXPECT_NONNULL(dirList);
    if (numEntries != 1) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on directory containing 1 file.", numEntries);
    }
    hdfsFreeFileInfo(dirList, numEntries);

    /* Create many files for ListDirectory to page through */
    strcpy(listDirTest, paths->prefix);
    strcat(listDirTest, "/for_list_test/");
    EXPECT_ZERO(hdfsCreateDirectory(fs, listDirTest));
    
    for (nFile = 0; nFile < 10000; nFile++) {
      snprintf(filename, PATH_MAX, "%s/many_files_%d", listDirTest, nFile);
      file = hdfsOpenFile(fs, filename, O_WRONLY, 0, 0, 0);
      EXPECT_NONNULL(file);
      EXPECT_ZERO(hdfsCloseFile(fs, file));
    }
    dirList = hdfsListDirectory(fs, listDirTest, &numEntries);
    EXPECT_NONNULL(dirList);
    hdfsFreeFileInfo(dirList, numEntries);
    if (numEntries != 10000) {
        fprintf(stderr, "hdfsListDirectory set numEntries to "
                "%d on directory containing 10000 files.", numEntries);
        return EIO;
    }

    /* Let's re-open the file for reading */
    file = hdfsOpenFile(fs, paths->file1, O_RDONLY, 0, 0, 0);
    EXPECT_NONNULL(file);

    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    errno = 0;
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalBytesRead);
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalLocalBytesRead);
    EXPECT_UINT64_EQ(UINT64_C(0), readStats->totalShortCircuitBytesRead);
    hdfsFileFreeReadStatistics(readStats);

    /* Verify that we can retrieve the hedged read metrics */
    EXPECT_ZERO(hdfsGetHedgedReadMetrics(fs, &hedgedMetrics));
    errno = 0;
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOps);
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOpsWin);
    EXPECT_UINT64_EQ(UINT64_C(0), hedgedMetrics->hedgedReadOpsInCurThread);
    hdfsFreeHedgedReadMetrics(hedgedMetrics);

    /* TODO: implement readFully and use it here */
    ret = hdfsRead(fs, file, tmp, sizeof(tmp));
    if (ret < 0) {
        ret = errno;
        fprintf(stderr, "hdfsRead failed and set errno %d\n", ret);
        return ret;
    }
    if (ret != expected) {
        fprintf(stderr, "hdfsRead was supposed to read %d bytes, but "
                "it read %d\n", ret, expected);
        return EIO;
    }
    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    errno = 0;
    EXPECT_UINT64_EQ((uint64_t)expected, readStats->totalBytesRead);
    hdfsFileFreeReadStatistics(readStats);
    EXPECT_ZERO(hdfsFileClearReadStatistics(file));
    EXPECT_ZERO(hdfsFileGetReadStatistics(file, &readStats));
    EXPECT_UINT64_EQ((uint64_t)0, readStats->totalBytesRead);
    hdfsFileFreeReadStatistics(readStats);
    EXPECT_ZERO(memcmp(paths->prefix, tmp, expected));
    EXPECT_ZERO(hdfsCloseFile(fs, file));

    //Non-recursive delete fails
    EXPECT_NONZERO(hdfsDelete(fs, paths->prefix, 0));
    EXPECT_ZERO(hdfsCopy(fs, paths->file1, fs, paths->file2));

    EXPECT_ZERO(hdfsChown(fs, paths->file2, NULL, NULL));
    EXPECT_ZERO(hdfsChown(fs, paths->file2, NULL, "doop"));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("doop", fileInfo->mGroup));
    EXPECT_ZERO(hdfsFileIsEncrypted(fileInfo));
    hdfsFreeFileInfo(fileInfo, 1);

    EXPECT_ZERO(hdfsChown(fs, paths->file2, "ha", "doop2"));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("ha", fileInfo->mOwner));
    EXPECT_ZERO(strcmp("doop2", fileInfo->mGroup));
    hdfsFreeFileInfo(fileInfo, 1);

    EXPECT_ZERO(hdfsChown(fs, paths->file2, "ha2", NULL));
    fileInfo = hdfsGetPathInfo(fs, paths->file2);
    EXPECT_NONNULL(fileInfo);
    EXPECT_ZERO(strcmp("ha2", fileInfo->mOwner));
    EXPECT_ZERO(strcmp("doop2", fileInfo->mGroup));
    hdfsFreeFileInfo(fileInfo, 1);

    snprintf(tmp, sizeof(tmp), "%s/nonexistent-file-name", paths->prefix);
    EXPECT_NEGATIVE_ONE_WITH_ERRNO(hdfsChown(fs, tmp, "ha3", NULL), ENOENT);

    //Test case: File does not exist
    EXPECT_NULL_WITH_ERRNO(hdfsGetPathInfo(fs, invalid_path), ENOENT);

    //Test case: No permission to access parent directory
    EXPECT_ZERO(hdfsChmod(fs, paths->prefix, 0));
    //reconnect as user "SomeGuy" and verify that we get permission errors
    
    EXPECT_ZERO(hdfsSingleNameNodeConnect(tlhCluster, &fs2, "SomeGuy"));
    EXPECT_NULL_WITH_ERRNO(hdfsGetPathInfo(fs2, paths->file2), EACCES);
    EXPECT_ZERO(hdfsDisconnect(fs2));
    return 0;
}

Section III Summary

Summary about this guide. 

Summary

Above comprehensive step by step instructions are provided for building Hadoop release 3.2.1. The setup takes a little bit time because there are many tools required to build though they can be categories into the following groups:

  • Java related tools: JDK, Maven
  • C/C++ and Windows related tools: Windows SDK, Visual C++, Cmake
  • Linux style CLIs: Git Bash
  • Other framework dependencies: Protocol Buffer
  • Aux tools: Python for generating document and also run some checks (once built, we can run Running compatibility checks with checkcompatibility.py).

Different Hadoop versions require different versions of these frameworks/tools. With this guide line, you can easily change slightly to:

  • build Hadoop 2.x on Windows
  • build Hadoop 3.0 on Windows 
  • build Hadoop 3.2.2/3.3.0 on Windows
  • or build with the latest code base

These approaches can also be easily extended to build other Apache frameworks too since many of them are Java based. 

If you encounter any issue while following this guide, please post a comment here and I will try my best to help.

More from Kontext
comment Comments
J Juergen Zimmer

Juergen access_time 4 years ago link more_vert

Hi Raymond,

I' trying to build hadoop 3.2.2 wintutils.
The build fails when building  Apache Hadoop HDFS Native Client (hadoop-hdfs-native-client).
Reason: Compiler issue when compiling jni_helper.c. Header 'dirent.h' cannot be found.
        [exec]   ..\..\..\..\..\src\main\native\libhdfs\jni_helper.c(28): fatal error C1083: Cannot open include file: 'dirent.h':
        No such file or directory [D:\hcp\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-clet\target\native\main\native\libhdfs-tests\native_mini_dfs.vcxproj]
The environment is setup as you've decribed in your blog (MS Visual Studio 2010, Win SDK 8.1, ...)

Any idea what's going wrong?

Thank you very much.
Best regards,
Juergen

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Juergen,

This is because that header file dirent.h doesn't exist in Visual C++ though it exists on Linux. 

You can download that header file from this repository and place it into your Visual ++ include folder.  For example, C:\Program Files (x86)\Microsoft Visual Studio XXX\VC\include. Replace XXX with your Visual Studio version. 

GitHub - tronkko/dirent: C/C++ library for retrieving information on files and directories

A Alexander Senderovitz

Alexander access_time 4 years ago link more_vert

i am getting the hadoop-common issue, but relinking it doesn't solve it..



Raymond Raymond

Raymond access_time 4 years ago link more_vert

The screenshot you provided doesn't include the detailed error message. Can you post it here?

BTW, could you please confirm whether you have followed all the exact steps in my article? It is not very easy to build Hadoop in Windows 10 and I only found the steps I provided are working when building Hadoop 3.2.1.

A Alexander Senderovitz

Alexander access_time 4 years ago link more_vert

HI Raymond,


Sure. I followed all the steps for 3.2.1 - and I got everything right up until the Maven build part - then I am getting this error message which I have been getting since the very beginning (I am building Hadoop from another guide at first - https://towardsdatascience.com/installing-hadoop-3-2-1-single-node-cluster-on-windows-10-ac258dd48aef

so I am trying many different things now. Also 3.3.0 through https://chocolatey.org/packages/hadoop and also your 3.3.0 guide.


Is this enough?


EDIT: I followed your 3.2.1 and 3..3.0 guides and also the compile and build 3.2.1 the very detailed one.

Raymond Raymond

Raymond access_time 4 years ago link more_vert

The screenshot is the info ones but not the error messages. The error messages appears before the summary of build result for each package. The texts will have something like ERROR in it.

H Hitendra Behera

Hitendra access_time 4 years ago link more_vert

Hi,

Getting following error message !

main:

     [exec] CMake Warning (dev) in CMakeLists.txt:

     [exec]   No project() command is present.  The top-level CMakeLists.txt file must

     [exec]   contain a li-- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.18363.

     [exec] teral, direct call to the project() command.  Add a line of

     [exec]   code such as

     [exec]

     [exec]     project(ProjectName)

     [exec]

     [exec]   near the top of the file, but after cmake_minimum_required().

     [exec]

     [exec]   CMake is pretending there is a "project(Project)" command on the first

     [exec]   line.

     [exec] This warning is for project developers.  Use -Wno-dev to suppress it.

     [exec]

     [exec] -- The C compiler identification is MSVC 19.28.29336.0

     [exec] -- The CXX compiler identification is MSVC 19.28.29336.0

     [exec] -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe

     [exec] -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe -- works

     [exec] -- Detecting C compiler ABI info

     [exec] -- Detecting C compiler ABI info - done

     [exec] -- Detecting C compile features

     [exec] -- Detecting C compile features - done

     [exec] -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe

     [exec] -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/bin/Hostx64/x64/cl.exe -- works

     [exec] -- Detecting CXX compiler ABI info

     [exec] -- Detecting CXX compiler ABI info - done

     [exec] -- Detecting CXX compile features

     [exec] -- Detecting CXX compile features - done

     [exec] CUSTOM_OPENSSL_PREFIX =

     [exec] Cannot find a usable OpenSSL library. OPENSSL_LIBRARY=OPENSSL_LIBRARY-NOTFOUND, OPENSSL_INCLUDE_DIR=C:/Bigdata/openssl-1.1.1i/include, CUSTOM_OPENSSL_LIB=, CUSTOM_OPENSSL_PREFIX=, CUSTOM_OPENSSL_INCLUDE=

     [exec] -- Performing Test THREAD_LOCAL_SUPPORTED

     [exec] -- Performing Test THREAD_LOCAL_SUPPORTED - Success

     [exec] -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)

     [exec] CMake Error at C:/Bigdata/CMake/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):

     [exec]   Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the

     [exec]   system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY

     [exec]   OPENSSL_INCLUDE_DIR)

     [exec] Call Stack (most recent call first):

     [exec]   C:/Bigdata/CMake/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)

     [exec]   C:/Bigdata/CMake/share/cmake-3.16/Modules/FindOpenSSL.cmake:449 (find_package_handle_stan-- Configuring incomplete, errors occurred!

     [exec] See also "C:/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/CMak

     [exec] dard_args)

     [exec]   main/native/libhdfspp/CMakeLists.txt:44 (find_package)

     [exec]

     [exec]

     [exec] eFiles/CMakeOutput.log".

     [exec] See also "C:/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/CMakeFiles/CMakeError.log".

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary for Apache Hadoop Main 3.2.1:

[INFO]

[INFO] Apache Hadoop Main ................................. SUCCESS [  1.916 s]

[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  4.382 s]

[INFO] Apache Hadoop Project POM .......................... SUCCESS [  1.207 s]

[INFO] Apache Hadoop Annotations .......................... SUCCESS [  0.918 s]

[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.309 s]

[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  2.061 s]

[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  2.690 s]

[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  0.326 s]

[INFO] Apache Hadoop Auth ................................. SUCCESS [  1.309 s]

[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  1.440 s]

[INFO] Apache Hadoop Common ............................... SUCCESS [ 38.920 s]

[INFO] Apache Hadoop NFS .................................. SUCCESS [  1.912 s]

[INFO] Apache Hadoop KMS .................................. SUCCESS [  1.998 s]

[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.132 s]

[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [  6.354 s]

[INFO] Apache Hadoop HDFS ................................. SUCCESS [ 13.608 s]

[INFO] Apache Hadoop HDFS Native Client ................... FAILURE [ 12.555 s]

[INFO] Apache Hadoop HttpFS ............................... SKIPPED

[INFO] Apache Hadoop HDFS-NFS ............................. SKIPPED

[INFO] Apache Hadoop HDFS-RBF ............................. SKIPPED

[INFO] Apache Hadoop HDFS Project ......................... SKIPPED

[INFO] Apache Hadoop YARN ................................. SKIPPED

[INFO] Apache Hadoop YARN API ............................. SKIPPED

[INFO] Apache Hadoop YARN Common .......................... SKIPPED

[INFO] Apache Hadoop YARN Registry ........................ SKIPPED

[INFO] Apache Hadoop YARN Server .......................... SKIPPED

[INFO] Apache Hadoop YARN Server Common ................... SKIPPED

[INFO] Apache Hadoop YARN NodeManager ..................... SKIPPED

[INFO] Apache Hadoop YARN Web Proxy ....................... SKIPPED

[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SKIPPED

[INFO] Apache Hadoop YARN Timeline Service ................ SKIPPED

[INFO] Apache Hadoop YARN ResourceManager ................. SKIPPED

[INFO] Apache Hadoop YARN Server Tests .................... SKIPPED

[INFO] Apache Hadoop YARN Client .......................... SKIPPED

[INFO] Apache Hadoop YARN SharedCacheManager .............. SKIPPED

[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase Common .... SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase Client .... SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase Servers ... SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase Server 1.2  SKIPPED

[INFO] Apache Hadoop YARN TimelineService HBase tests ..... SKIPPED

[INFO] Apache Hadoop YARN Router .......................... SKIPPED

[INFO] Apache Hadoop YARN Applications .................... SKIPPED

[INFO] Apache Hadoop YARN DistributedShell ................ SKIPPED

[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SKIPPED

[INFO] Apache Hadoop MapReduce Client ..................... SKIPPED

[INFO] Apache Hadoop MapReduce Core ....................... SKIPPED

[INFO] Apache Hadoop MapReduce Common ..................... SKIPPED

[INFO] Apache Hadoop MapReduce Shuffle .................... SKIPPED

[INFO] Apache Hadoop MapReduce App ........................ SKIPPED

[INFO] Apache Hadoop MapReduce HistoryServer .............. SKIPPED

[INFO] Apache Hadoop MapReduce JobClient .................. SKIPPED

[INFO] Apache Hadoop Mini-Cluster ......................... SKIPPED

[INFO] Apache Hadoop YARN Services ........................ SKIPPED

[INFO] Apache Hadoop YARN Services Core ................... SKIPPED

[INFO] Apache Hadoop YARN Services API .................... SKIPPED

[INFO] Apache Hadoop Image Generation Tool ................ SKIPPED

[INFO] Yet Another Learning Platform ...................... SKIPPED

[INFO] Apache Hadoop YARN Site ............................ SKIPPED

[INFO] Apache Hadoop YARN UI .............................. SKIPPED

[INFO] Apache Hadoop YARN Project ......................... SKIPPED

[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SKIPPED

[INFO] Apache Hadoop MapReduce NativeTask ................. SKIPPED

[INFO] Apache Hadoop MapReduce Uploader ................... SKIPPED

[INFO] Apache Hadoop MapReduce Examples ................... SKIPPED

[INFO] Apache Hadoop MapReduce ............................ SKIPPED

[INFO] Apache Hadoop MapReduce Streaming .................. SKIPPED

[INFO] Apache Hadoop Distributed Copy ..................... SKIPPED

[INFO] Apache Hadoop Archives ............................. SKIPPED

[INFO] Apache Hadoop Archive Logs ......................... SKIPPED

[INFO] Apache Hadoop Rumen ................................ SKIPPED

[INFO] Apache Hadoop Gridmix .............................. SKIPPED

[INFO] Apache Hadoop Data Join ............................ SKIPPED

[INFO] Apache Hadoop Extras ............................... SKIPPED

[INFO] Apache Hadoop Pipes ................................ SKIPPED

[INFO] Apache Hadoop OpenStack support .................... SKIPPED

[INFO] Apache Hadoop Amazon Web Services support .......... SKIPPED

[INFO] Apache Hadoop Kafka Library support ................ SKIPPED

[INFO] Apache Hadoop Azure support ........................ SKIPPED

[INFO] Apache Hadoop Aliyun OSS support ................... SKIPPED

[INFO] Apache Hadoop Client Aggregator .................... SKIPPED

[INFO] Apache Hadoop Scheduler Load Simulator ............. SKIPPED

[INFO] Apache Hadoop Resource Estimator Service ........... SKIPPED

[INFO] Apache Hadoop Azure Data Lake support .............. SKIPPED

[INFO] Apache Hadoop Tools Dist ........................... SKIPPED

[INFO] Apache Hadoop Tools ................................ SKIPPED

[INFO] Apache Hadoop Client API ........................... SKIPPED

[INFO] Apache Hadoop Client Runtime ....................... SKIPPED

[INFO] Apache Hadoop Client Packaging Invariants .......... SKIPPED

[INFO] Apache Hadoop Client Test Minicluster .............. SKIPPED

[INFO] Apache Hadoop Client Packaging Invariants for Test . SKIPPED

[INFO] Apache Hadoop Client Packaging Integration Tests ... SKIPPED

[INFO] Apache Hadoop Distribution ......................... SKIPPED

[INFO] Apache Hadoop Client Modules ....................... SKIPPED

[INFO] Apache Hadoop Cloud Storage ........................ SKIPPED

[INFO] Apache Hadoop Cloud Storage Project ................ SKIPPED

[INFO] ------------------------------------------------------------------------

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time:  01:35 min

[INFO] Finished at: 2021-01-16T15:08:15+11:00

[INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-hdfs-native-client: An Ant BuildException has occured: exec returned: 1

[ERROR] around Ant part ...<exec failonerror="true" dir="C:\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target/native" executable="cmake">... @ 5:123 in C:\hadoop\hadoop-hdfs-project\hadoop-hdfs-native-client\target\antrun\build-main.xml

[ERROR] -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <args> -rf :hadoop-hdfs-native-client


C:\hadoop>


Raymond Raymond

Raymond access_time 4 years ago link more_vert

I've replied to you in the CMake post. You need to install Microsoft vcpkg (Visual C++ package manager) to install the openssl package into your Windows system to meet the requirements of Hadoop HDFS C/C++ native client project. The GitHub documentation provides very detailed steps about how to install it. Let me know if you are still not clear and I can share a detailed guide about it. 


J Jin Wrap

Jin access_time 4 years ago link more_vert

Hi Raymond - can you advise? I got to the last step and got stuck with the build, encountering this error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.2.0:shade (default) on project hadoop-client-minicluster: Error creating shaded jar: duplicate entry: META-INF/services/org.apache.hadoop.shaded.org.eclipse.jetty.http.HttpFieldPreEncoder -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hadoop-client-minicluster

Searched online and this site said to use "clean" so I ran the following code:

mvn clean package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

And the build did come back as successful. However, i don't see Hadoop-3.2.1 at C:/hdp/hadoop/hadoop-dist/target. 

Perhaps using clean is not the right way? Any thoughts on how to fix this issue?

Thanks!

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi  Jin,

I don't think Maven clean is the approach to resolve a build problem.

mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true -X
Can you clean up the generated files and rerun your build with additional -X option? In this way you can see all the detailed errors. Usually the detailed error message will be able to guide you to resolve the problem.

-Raymond 
J Jin Wrap

Jin access_time 4 years ago link more_vert

Thank you for your reply! I ran mvn clean (if that's what you meant by cleaning up the generated files) and then the code you gave (with the -X switch) and this time the build was successful (so, no detailed error log popped up that could help manage this issue) but it seems the build still isn't there in hadoop-dist/target.

I followed your steps to the T, though the only other thing that popped up was during the build initially I had a similar error ([ERROR]: Failed to execute goal org.codehaus.mojo:exec-maven-plugin) with hadoop-dist which i addressed as described in this stackoverflow, i.e., I pasted the following code into the POM with hadoop-dist.

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.3.1</version>
</plugin>

Could this be why it's not working still? Thank you again for your help! This whole process is all very new to me...







Raymond Raymond

Raymond access_time 4 years ago link more_vert

I tested all the above steps in a shiny new Windows 10 virtual machine without any Maven packages installed thus I don’t think the step you mentioned is necessary.

Can you please confirm the following?

  • Building the same version Hadoop as mentioned?
  • Followed exact steps in this article? In the past, a few users didn’t follow all the instructions or were building a different version Hadoop.
  • Can view all the projects built successfully as the following screenshot shows? And also in each project folder there is also successfully built artifacts?

Raymond Raymond

Raymond access_time 5 years ago link more_vert

To build Hadoop 3.3.0, follow build instructions:

https://github.com/apache/hadoop/blob/rel/release-3.3.0/BUILDING.txt

Most of these steps are similar to this post. 


Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts