Apache Hive 3.1.2 Installation on Linux Guide

On this page I'm going to show you how to install the latest version Apache Hive 3.1.2 on Windows 10 using Windows Subsystem for Linux (WSL) Ubuntu distro. You can follow these instructions to install Apache Hive 3.1.2 on any UNIX-alike systems incl. Debian, Ubuntu, openSUSE, Red Hat, etc. We will also include step of configuring MySQL as remote metastore for Hive to support concurrent requests.

warning Alert - Apache Hive is impacted by Log4j vulnerabilities; refer to page Apache Log4j Security Vulnerabilities to find out the fixes.

Prerequisites

Install WSL in a system or non-system drive on your Windows 10 and then install Hadoop 3.3.0 on it:

Now let’s start to install Apache Hive 3.1.2 on WSL.

Download binary package

Select a package from the download page:

https://hive.apache.org/downloads.html

For me, the recommended location is: https://www.strategylions.com.au/mirror/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz.

In WSL bash terminal, run the following command to download the package:

wget https://www.strategylions.com.au/mirror/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

Unzip binary package

If you have configured Hadoop 3.3.0 successfully by following Kontext guide (in prerequisites section), there should be one folder named hadoopexisting in your home folder already:

$ ls -l
total 3716664
-rw-r--r-- 1 tangr tangr 278813748 Jul  3 14:35 apache-hive-3.1.2-bin.tar.gz
drwxrwxrwx 1 tangr tangr      4096 May 16  2019 dfs
drwxrwxrwx 1 tangr tangr      4096 Dec 27 23:02 hadoop

Now unzip Hive package using the following command:

tar -xvzf apache-hive-3.1.2-bin.tar.gz -C ~/hadoop

In the hadoopfolder there are now two subfolders at least (one for Hadoop and another for Hive):

$ ls ~/hadoop
apache-hive-3.1.2-bin  hadoop-3.3.0

Setup environment variables

In the prerequisites sections, we’ve already configured some environment variables like the following:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/tangr/hadoop/hadoop-3.3.0
export PATH=$PATH:$HADOOP_HOME/bin

*Note: your user name can be different.

Let’s run the following command to add Hive required environment variables into .bashrc file too:

vi ~/.bashrc

Add the following lines to the end of the file:

export HIVE_HOME=/home/tangr/hadoop/apache-hive-3.1.2-binexport PATH=$HIVE_HOME/bin:$PATH

* Change the highlighted user name to your own one.

Run the following command to source the variables:

source ~/.bashrc

Verify the environment variables:

echo $HIVE_HOME
/home/tangr/hadoop/apache-hive-3.1.2-bin

Setup Hive HDFS folders

Start your Hadoop services (if you have not done that) by running the following command:

$HADOOP_HOME/sbin/start-all.sh

In WSL, you may need to restart you ssh services if ssh doesn’t work:

localhost: ssh: connect to host localhost port 22: Connection refused

To restart the services, run the following command:

sudo service ssh restart

Run the following command (jps) to make sure all the services are running successfully.

$ jps
2306 NameNode
2786 SecondaryNameNode
3235 NodeManager
3577 Jps
2491 DataNode
3039 ResourceManager

As you can see, all the services are running successfully in my WSL.

Now let’s setup the HDFS folders for Hive.

Run the following commands:

hadoop fs -mkdir /tmp hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse

Or alternatively, directly run the following command:

$HIVE_HOME/bin/init-hive-dfs.sh

guava lib issue

To avoid issues like the one described on page: Hive: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V, let's ensure guava library version is consistent between Hive and Hadoop.

Hadoop 3.3.0 -

$ ls $HADOOP_HOME/share/hadoop/common/lib | grep guava
guava-27.0-jre.jar

Hive 3.1.2 -

$ ls $HIVE_HOME/lib | grep guava
guava-19.0.jar
jersey-guava-2.25.1.jar

Copy newer version from Hadoop to Hive:

rm $HIVE_HOME/lib/guava-19.0.jar
cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/

Install MySQL server

Install MySQL server

Microsoft has published very detailed tutorials about how to install MySQL on WSL: Get started using MySQL MongoDB, PostgreSQL, SQLite, Microsoft SQL Server, or Redis to set up a database on Windows Subsystem for Linux | Microsoft Docs. Please follow the steps to configure a MySQL instance on WSL or your UNIX-alike systems.

If you encounter errors like this: Cannot open /proc/net/unix: No such file or directory, please close your WSL and then start service using the following command:

sudo /etc/init.d/mysql start
 * Starting MySQL database server mysqld                                                                                No directory, logging in with HOME=/
                                                                                                                 [ OK ]

*Try also to restart your Windows system too if you are installing on a WSL Linux sub-system.

Make sure you can enter mysql CLI (sudo mysql):

$ sudo mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.32-0ubuntu0.18.04.1 (Ubuntu)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Create Hive metastore database and user

Run the following commands in mysql CLI:

sudo mysql
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive' password expire never;
mysql> GRANT ALL ON *.* TO 'hive'@'localhost';
mysql> exit;
mysql -u hive -phive
mysql> create database hive_metastore;
Query OK, 1 row affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive_metastore     |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

After these steps, the following database objects are created:

user: hive
password: hive
database: hive_metastore

These details are required in the following steps to create Hive metastore.

warning MySQL database server needs to be running before every time you can start your Hive metastore and HiveServer2 services.

Configure Hive metastore

Follow these steps to configure a remote Hive metastore. These steps are similar as the SQL Server configuration guide: Configure a SQL Server Database as Remote Hive Metastore.

Download MySQL JDBC driver

First let's find out MySQL version to download the compatible JDBC driver:

mysql -V
mysql  Ver 14.14 Distrib 5.7.32, for Linux (x86_64) using  EditLine wrapper

Download MySQL JDBC driver from official website MySQL :: Download Connector/J and choose the right version for your Linux/WSL system.

Alternatively, we can directly download it from Maven central.

For my WSL, I am using the following version (8.0.22):

wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.22/mysql-connector-java-8.0.22.jar

Move the downloaded JAR file to Hive lib folder:

mv mysql-connector-java-8.0.22.jar $HIVE_HOME/lib/

Configure Hive file hive-site.xml

Edit hive-default.xml file (or create one if not existing) in folder $HIVE_HOME/conf.

* You can create the configuration file by using template file hive-default.xml.template.

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml

The above command needs to run in Cygwin terminal.

We need configure the following properties accordingly:

javax.jdo.option.ConnectionDriverName: com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver depends on the version of your MySQL JDBC driver.
javax.jdo.option.ConnectionURL: jdbc:mysql://localhost/hive_metastore
javax.jdo.option.ConnectionUserName: hive
javax.jdo.option.ConnectionPassword: hive
hive.metastore.uris: thrift://127.0.0.1:9083
hive.metastore.db.type: mysql

infoFor MySQL JDBC driver version 8.0.22, the driver class name is com.mysql.cj.jdbc.Driver.

The highlighted values are from our configurations in MySQL Server installation step.

For my environment, these items are configured as the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
    <description>
      Should metastore do authorization against database notification related APIs such as get_next_notification.
      If set to true, then only the superusers in proxy settings have the permission
    </description>
  </property>
  <property>
    <name>hive.metastore.db.type</name>
    <value>mysql</value>
    <description>
      Expects one of [derby, oracle, mysql, mssql, postgres].
      Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
    </description>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://127.0.0.1:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.ds.connection.url.hook</name>
    <value/>
    <description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost/hive_metastore</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
</configuration>

warning lf your MySQL is not running on the default port, please update the JDBC connection string accordingly. For example, 'jdbc:mysql://localhost:10101/hive_metastore?serverTimezone=UTC' specifies the port as 10101 on the same server.

Initialize database structure

Now we need to run schematool to setup metastore for Hive. The command syntax looks like the following:

$HIVE_HOME/bin/schematool -dbType <db type> -initSchema

For argument dbType, it can be any of the following values:

derby|mysql|postgres|oracle|mssql

By default, Apache Derby will be used. However it is a standalone database and can only be used for one connection concurrently.

Let's now run the following command in Bash to initiate database schema:

$HIVE_HOME/bin/schematool -dbType mysql -initSchema

When I was initializing the schema on WSL system, I got the following error:

Metastore connection URL:        jdbc:mysql://localhost/hive_metastore
Metastore Connection Driver :    com.mysql.cj.jdbc.Driver
Metastore connection User:       hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : The server time zone value 'AEDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the 'serverTimezone' configuration property) to use a more specific time zone value if you want to utilize time zone support.
SQL Error code: 0
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

This is a new error I didn't encounter before when configuring SQL Server database as Hive metastore.

To fix this issue, refer to MySQL: The server time zone value 'AEDT' is unrecognized or represents more than one time zone. We just need to update JDBC connection string accordingly:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost/hive_metastore?serverTimezone=UTC</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

Once the command line completes, you should be able to see the successful messages at the end:

...

Initialization script completed
schemaTool completed

Verify the schema in mysql CLI

Double confirm the schema using mysql CLI:

mysql> use hive_metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------------------------+
| Tables_in_hive_metastore      |
+-------------------------------+
| AUX_TABLE                     |
| BUCKETING_COLS                |
| CDS                           |
| COLUMNS_V2                    |
| COMPACTION_QUEUE              |
| COMPLETED_COMPACTIONS         |
| COMPLETED_TXN_COMPONENTS      |
| CTLGS                         |
| DATABASE_PARAMS               |
| DBS                           |
| DB_PRIVS                      |
| DELEGATION_TOKENS             |
| FUNCS                         |
| FUNC_RU                       |
| GLOBAL_PRIVS                  |
| HIVE_LOCKS                    |
| IDXS                          |
| INDEX_PARAMS                  |
| I_SCHEMA                      |
| KEY_CONSTRAINTS               |
| MASTER_KEYS                   |
| MATERIALIZATION_REBUILD_LOCKS |
| METASTORE_DB_PROPERTIES       |
| MIN_HISTORY_LEVEL             |
| MV_CREATION_METADATA          |
| MV_TABLES_USED                |
| NEXT_COMPACTION_QUEUE_ID      |
| NEXT_LOCK_ID                  |
| NEXT_TXN_ID                   |
| NEXT_WRITE_ID                 |
| NOTIFICATION_LOG              |
| NOTIFICATION_SEQUENCE         |
| NUCLEUS_TABLES                |
| PARTITIONS                    |
| PARTITION_EVENTS              |
| PARTITION_KEYS                |
| PARTITION_KEY_VALS            |
| PARTITION_PARAMS              |
| PART_COL_PRIVS                |
| PART_COL_STATS                |
| PART_PRIVS                    |
| REPL_TXN_MAP                  |
| ROLES                         |
| ROLE_MAP                      |
| RUNTIME_STATS                 |
| SCHEMA_VERSION                |
| SDS                           |
| SD_PARAMS                     |
| SEQUENCE_TABLE                |
| SERDES                        |
| SERDE_PARAMS                  |
| SKEWED_COL_NAMES              |
| SKEWED_COL_VALUE_LOC_MAP      |
| SKEWED_STRING_LIST            |
| SKEWED_STRING_LIST_VALUES     |
| SKEWED_VALUES                 |
| SORT_COLS                     |
| TABLE_PARAMS                  |
| TAB_COL_STATS                 |
| TBLS                          |
| TBL_COL_PRIVS                 |
| TBL_PRIVS                     |
| TXNS                          |
| TXN_COMPONENTS                |
| TXN_TO_WRITE_ID               |
| TYPES                         |
| TYPE_FIELDS                   |
| VERSION                       |
| WM_MAPPING                    |
| WM_POOL                       |
| WM_POOL_TO_TRIGGER            |
| WM_RESOURCEPLAN               |
| WM_TRIGGER                    |
| WRITE_SET                     |
+-------------------------------+
74 rows in set (0.00 sec)

mysql>

All these tables in Hive metastore schema are created successfully.

Configure Hive API authentication

Add the following section to $HIVE_HOME/conf/hive-site.xml file:

<property>    <name>hive.metastore.event.db.notification.api.auth</name>     <value>false</value>     <description>       Should metastore do authorization against database notification related APIs such as get_next_notification.       If set to true, then only the superusers in proxy settings have the permission     </description>   </property>

And then update Hadoop core-site.xml configuration file to add the following configurations:

<property>      <name>hadoop.proxyuser.tangr.hosts</name>      <value>*</value>
</property><property>      <name>hadoop.proxyuser.tangr.groups</name>      <value>*</value>
</property>

* Replace the highlighted user name to your own user name.

Now all the configurations are done and we are ready to start Hive services. Make sure Hadoop services are running before kicking off Hive services.

Start HiveServer2 service

Run the command below to start the HiveServer2 service:

$HIVE_HOME/bin/hive --service metastore &$HIVE_HOME/bin/hive --service hiveserver2 &

Wait until you can open HiveServer2 Web UI: http://localhost:10002/.

Practices

You can follow section ‘DDL practices’ in my previous post to test your Hive data warehouse.

Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide

check Congratulations, you've completed the installation guide for Hive on WSL or other UNIX-alike systems!