Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

Raymond Raymond event 2019-05-18 visibility 12,232 comment 20
more_vert
Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

Previously, I demonstrated how to configured Apache Hive 3.0.0 on Windows 10.

On this page, I’m going to show you how to install the latest version Apache Hive 3.1.1 on Windows 10 using Windows Subsystem for Linux (WSL) Ubuntu distro.

infoYou can follow these instructions to install Apache Hive 3.1.2 too on WSL or any UNIX-alike systems incl. Debian, Ubuntu, openSUSE, Red Hat, MacOS, etc. 
warning Alert - Apache Hive is impacted by Log4j vulnerabilities; refer to page Apache Log4j Security Vulnerabilities to find out the fixes.

Prerequisites

Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10.

Please also install Hadoop 3.2.0 on your WSL following the second page.

Now let’s start to install Apache Hive 3.1.1 in WSL.

Download binary package

Select a package from the download page:

https://hive.apache.org/downloads.html

For me, the recommended location is: http://www.strategylions.com.au/mirror/hive/hive-3.1.1/apache-hive-3.1.1-bin.tar.gz

In WSL bash terminal, run the following command to download the package:

wget http://www.strategylions.com.au/mirror/hive/hive-3.1.1/apache-hive-3.1.1-bin.tar.gz

Unzip binary package

If you have configured Hadoop 3.2.0 successfully, there should be one hadoop folder existing in your home folder already:

$ ls -lt
total 611896
drwxrwxrwx 1 tangr tangr      4096 May 16 00:32 dfs
drwxrwxrwx 1 tangr tangr      4096 May 15 23:48 hadoop
-rw-rw-rw- 1 tangr tangr  345625475 Jan 22 02:15 hadoop-3.2.0.tar.gz
-rw-rw-rw- 1 tangr tangr 280944629 Nov  1  2018 apache-hive-3.1.1-bin.tar.gz

Now unzip Hive package using the following command:

tar -xvzf apache-hive-3.1.1-bin.tar.gz -C ~/hadoop

In the hadoop folder there are now two subfolders:

$ ls ~/hadoop
apache-hive-3.1.1-bin  hadoop-3.2.0

Setup environment variables

In the prerequisites sections, we’ve already configured some environment variables like the following:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/tangr/hadoop/hadoop-3.2.0
export PATH=$PATH:$HADOOP_HOME/bin

*Note: your user name can be different.

Let’s run the following command to add Hive required environment variables into .bashrc file too:

vi ~/.bashrc

Add the following lines to the end of the file:

export HIVE_HOME=/home/tangr/hadoop/apache-hive-3.1.1-bin
export PATH=$HIVE_HOME/bin:$PATH

Change the highlighted user name to your own one.

Run the following command to source the variables:

source ~/.bashrc

Verify the environment variables:

echo $HIVE_HOME
/home/tangr/hadoop/apache-hive-3.1.1-bin

Setup Hive HDFS folders

Start your Hadoop services (if you have not done that) by running the following command:

$HADOOP_HOME/sbin/start-all.sh

In WSL, you may need to restart you ssh services if ssh doesn’t work:

localhost: ssh: connect to host localhost port 22: Connection refused

To restart the services, run the following command:

sudo service ssh restart

Run the following command (jps) to make sure all the services are running successfully.

$ jps
2306 NameNode
2786 SecondaryNameNode
3235 NodeManager
3577 Jps
2491 DataNode
3039 ResourceManager

As you can see, all the services are running successfully in my WSL.

Now let’s setup the HDFS folders for Hive.

Run the following commands:

hadoop fs -mkdir /tmp 
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

Configure Hive metastore

Now we need to run schematool to setup metastore for Hive. The command syntax looks like the following:

$HIVE_HOME/bin/schematool -dbType <db type> -initSchema

For argument dbType, it can be any of the following values:

derby|mysql|postgres|oracle|mssql

By default, Apache Derby will be used. However it is a standalone database and can only be used for one connection concurrently.

So now you have two options:

  • Option 1 (highly-recommended): Initialize using a remote database. For my scenario, I will use a SQL Server database as remote store. For more details, please follow this page to setup a remote database as datastore: Configure a SQL Server Database as Remote Hive Metastore.
  • Option 2: Initialize using Derby by running the following command:
$HIVE_HOME/bin/schematool -dbType derby -initSchema

Configure Hive API authentication

Add the following section to $HIVE_HOME/conf/hive-site.xml file:

<property>
    <name>hive.metastore.event.db.notification.api.auth</name>
     <value>false</value>
     <description>
       Should metastore do authorization against database notification related APIs such as get_next_notification.
       If set to true, then only the superusers in proxy settings have the permission
     </description>
   </property>

And then update Hadoop core-site.xml configuration file to add the following configurations:

<property>
      <name>hadoop.proxyuser.tangr.hosts</name>
      <value>*</value> </property>
<property>
      <name>hadoop.proxyuser.tangr.groups</name>
      <value>*</value> </property>

Replace the highlighted user name to your own user name.

Now all the configurations are done.

Start HiveServer2 service

Run the command below to start the HiveServer2 service:

$HIVE_HOME/bin/hive --service metastore &
$HIVE_HOME/bin/hive --service hiveserver2 &

Wait until you can open HiveServer2 Web UI:  http://localhost:10002/.

Practices

You can follow section ‘DDL practices’ in my previous post to test your Hive data warehouse.

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

I’ll continue to publish a number of other posts about installing latest Hadoop ecosystem tools/frameworks in WSL. You can follow this website by subscribing RSS.

More from Kontext
comment Comments
G Guy A

Guy access_time 4 years ago link more_vert

some missing steps:


just before running the initSchema, need to do 3 things:

1. need to copy the conf file:

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml


2. need to remove a typo from line 3215 of the file: &#8;

nano or vim /home/guy/hadoop/apache-hive-3.1.2-bin/conf/hive-site.xml and remore the letters in the middle of the line ( Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks for <<&#8;>> transactional tables.  This ensures that inserts (w/o overwrite) running concurrently)

3. also need to remove  guava-19.0.jar from   apache-hive-3.1.2/lib and copy the current one from the folder hadoop-3.3.0/share/hadoop/common/lib/


after that you can run the init schema for the derbyDB

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Thanks for summarizing this, Guy.

For the Hive 3.1.2 installation guide on this site, I've already incorporated it.

Cheers, Raymond

G Guy A

Guy access_time 4 years ago link more_vert

thank you :)


the walk through helped me alot.

now waiting for impala guide !


G Guy A

Guy access_time 4 years ago link more_vert

hello

i followed your hadoop install in wsl guide and it worked fine. but this one for hive is not. i am not able to connect to local web gui and not with python . port 10000 is not listening.  i also cannot find any help about it.


thank you

Raymond Raymond

Raymond access_time 4 years ago link more_vert

That potentially suggests that your Hive metastore and HiveServer2 services are not started successfully. Can you please check the log files to find out the actual errors? 

The logs are located here:

/tmp/<userid>/hive.log

/tmp/<userid>/hive.log.**

G Guy A

Guy access_time 4 years ago link more_vert

thank you for reply !


while starting this command:

$HIVE_HOME/bin/hive --service metastore


i got this error in the log:

2021-01-29T18:10:30,565 ERROR [main] metastore.HiveMetaStore: Metastore Thrift Server threw an exception...
org.apache.hadoop.hive.metastore.api.MetaException: Version information not found in metastore.

and this:

Caused by: java.net.ConnectException: Call From guyHP/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

this command:

$HIVE_HOME/bin/hive --service hiveserver2

seems to run ok.


guya@guyHP:~$ $HIVE_HOME/bin/hive --service hiveserver2
2021-01-29 18:12:26: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/guya/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/guya/hadoop/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 91577ee2-3d67-4369-a63d-0165f6221a62
Hive Session ID = 69299ff5-af98-44e5-a473-d102a1b28d93


Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Guy,

Most likely it is because that your Hive Metastore database schema is not correct, for example, init schema operation was not done properly.

Can you conduct the following two actions?

  1. Double check your Hive metastore configurations are correct. for example, database name, type, and username and password, etc.
  2. Check whether your metastore database is initialized successfully. For example, the following screenshot shows all the tables created by the init step in this guide. 


These tables are used by Hive to store metadata of your objects in Hive database. 

If you cannot start metastore service successfully, HiveServer2 thrift service will not be able to function properly.

G Guy A

Guy access_time 4 years ago link more_vert

one more thing, it appears that there are two bugs in the apache-hive-3.1.2

1. there is a typo in line 3215 of file hive-site.xml ( needed to be deleted )

2. the guava jar should be the same version as the hadoop one.

after fixing those 2 things.  ( and copying the default xml file to hive-site.xml ) i was able to run the :

$HIVE_HOME/bin/schematool -dbType derby -initSchema

script successfully

so now we are back to why i cannot log into the web page:

http://localhost:10002/

as two services are running

$HIVE_HOME/bin/hive --service metastore &
$HIVE_HOME/bin/hive --service hiveserver2 &


and no error in the log files.


thank you

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Guy,

Can you confirm whether you can connect to HiveServer2 or not?

It takes a little bit time for the web UI service to get started. 

Can you run the following command to see if two RunJar process are running (One is for metastore and another for HiverServer2)?

jps -mlv

-Raymond 

G Guy A

Guy access_time 4 years ago link more_vert

hello

yes, after fixing some stuff .. mentioned in my previous message, i am able now to connect.

the problems ( 5 of them ) where consisting on missing information.


thank you


can i also ask if you can add a tutorial of how to implement impala on windows WSL ? 

:)

Raymond Raymond

Raymond access_time 4 years ago link more_vert

I'm glad it worked. I will try to publish one installation guide for Impala when I get time. If you are interested, you can also try and publish your steps on Kontext too. 

Cheers,

Raymond

G Guy A

Guy access_time 4 years ago link more_vert

yes please ! to both.

i will add the steps and fixes i did.

and would very much be happy with impala walk through.

Thank you

Guy

Raymond Raymond

Raymond access_time 4 years ago link more_vert

I have not got time to create the Impala guide yet but I've created one for HBase just in case you are interested.

Install HBase in WSL - Pseudo-Distributed Mode


G Guy A

Guy access_time 4 years ago link more_vert

hello


thank you sir for answer.

i found some stuff ... while doing the step-by-step i went alone with derby DB .. but since it didn't work. i decided to try mssql DB.. following this link:

https://kontext.tech/column/hadoop/302/configure-a-sql-server-database-as-remote-hive-metastore


found out there are steps in this page that are not in the original page.

namely this command:

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml

there is though a code you said to add the hive-site.xml file .. but maybe there are lines missing there ? ( this is a huge file and millions of parameters in it )


could you recheck please if there is not omitted part in the original page :

https://kontext.tech/column/hadoop/309/apache-hive-311-installation-on-windows-10-using-windows-subsystem-for-linux


thank you again for help !


Guy

J Jesdin Raphael

Jesdin access_time 5 years ago link more_vert

I am unable to initialize schema

$HIVE_HOME/conf/hive-site.xml

<configuration>

        <property>

            <name>hive.metastore.event.db.notification.api.auth</name>

             <value>false</value>

             <description>

               Should metastore do authorization against database notification related APIs such as get_next_notification.

               If set to true, then only the superusers in proxy settings have the permission

             </description>

        </property>

</configuration>


 $HADOOP_HOME/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

        <property>

                 <name>fs.defaultFS</name>

                 <value>hdfs://localhost:9000</value>

        </property>

        <property>

                <name>hadoop.proxyuser.dataflair.groups</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.dataflair.hosts</name>

                <value>*</value>

        </property>

        <property>

              <name>hadoop.proxyuser.jesdin.hosts</name>

              <value>*</value>

        </property>

        <property>

              <name>hadoop.proxyuser.jesdin.groups</name>

             <value>*</value>

        </property>

</configuration>

Raymond Raymond

Raymond access_time 5 years ago link more_vert

Hi,

Sorry for the late reply.

From the screenshot, I can see you are installing Hive 3.1.2 with Hadoop 3.3.0.

This guide was only tested with Hive 3.1.1.

Can you ensure you use the 3.1.1 binary package to install?

The error you encountered seems to be related to different versions of JAR packages in your Hadoop and Hive library folder. 

Y Yathish K

Yathish access_time 5 years ago link more_vert

How do we run schematool in windows.

Also failing to run with cygwin available


Raymond Raymond

Raymond access_time 5 years ago link more_vert

This article is for Hive 3.1.1 installation on Windows 10 using WSL. All the command line needs to run in WSL bash window (not Command Prompt). 

Based on your screenshot, you are trying to install it on Windows 10 directly. If that's the case, please following the following article:

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

It has been tested by quite a few users with successful installation. 

hide_source Anonymous

Arun access_time 6 years ago link more_vert

Hi Mate, we couldn't find file "$HIVE_HOME/conf/hive-site.xml" in 3.1.1 package. Alternatively tried other versions couldn't find same file there too.

Please let me know how do I get/fix it.

Many Thanks 

Raymond Raymond

Raymond access_time 6 years ago link more_vert

If it doesn’t exist, you can create one using the template file in the same directory : hive-site.xml.template.

If the template file didn’t exist either, you can create this file directly. The root element for this XML file is configuration:

<configuration>

...

</configuration>

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts