Previously, I demonstrated how to configured Apache Hive 3.0.0 on Windows 10.
On this page, I’m going to show you how to install the latest version Apache Hive 3.1.1 on Windows 10 using Windows Subsystem for Linux (WSL) Ubuntu distro.
Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10.
Please also install Hadoop 3.2.0 on your WSL following the second page.
Now let’s start to install Apache Hive 3.1.1 in WSL.
Select a package from the download page:
For me, the recommended location is: http://www.strategylions.com.au/mirror/hive/hive-3.1.1/apache-hive-3.1.1-bin.tar.gz
In WSL bash terminal, run the following command to download the package:
If you have configured Hadoop 3.2.0 successfully, there should be one hadoop folder existing in your home folder already:
$ ls -lt
drwxrwxrwx 1 tangr tangr 4096 May 16 00:32 dfs
drwxrwxrwx 1 tangr tangr 4096 May 15 23:48 hadoop
-rw-rw-rw- 1 tangr tangr 345625475 Jan 22 02:15 hadoop-3.2.0.tar.gz
-rw-rw-rw- 1 tangr tangr 280944629 Nov 1 2018 apache-hive-3.1.1-bin.tar.gz
Now unzip Hive package using the following command:
tar -xvzf apache-hive-3.1.1-bin.tar.gz -C ~/hadoop
In the hadoop folder there are now two subfolders:
$ ls ~/hadoop
In the prerequisites sections, we’ve already configured some environment variables like the following:
*Note: your user name can be different.
Let’s run the following command to add Hive required environment variables into .bashrc file too:
Add the following lines to the end of the file:
Change the highlighted user name to your own one.
Run the following command to source the variables:
Verify the environment variables:
Start your Hadoop services (if you have not done that) by running the following command:
In WSL, you may need to restart you ssh services if ssh doesn’t work:
localhost: ssh: connect to host localhost port 22: Connection refused
To restart the services, run the following command:
sudo service ssh restart
Run the following command (jps) to make sure all the services are running successfully.
As you can see, all the services are running successfully in my WSL.
Now let’s setup the HDFS folders for Hive.
Run the following commands:
hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
Now we need to run schematool to setup metastore for Hive.
$HIVE_HOME/bin/schematool -dbType <db type> -initSchema
For argument dbType, it can be any of the following values:
By default, Apache Derby will be used. However it is a standalone database and can only be used for one connection concurrently.
So now you have two options:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
Add the following section to $HIVE_HOME/conf/hive-site.xml file:
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
And then update Hadoop core-site.xml configuration file to add the following configurations:
Replace the highlighted user name to your own user name.
Now all the configurations are done.
Run the command below to start the HiveServer2 service:
$HIVE_HOME/bin/hive --service metastore &
$HIVE_HOME/bin/hive --service hiveserver2 &
Wait until you can open HiveServer2 Web UI: http://localhost:10002/.
You can follow section ‘DDL practices’ in my previous post to test your Hive data warehouse.
I’ll continue to publish a number of other posts about installing latest Hadoop ecosystem tools/frameworks in WSL. You can follow this website by subscribing RRS.