Apache Hive 3.1.2 Installation on Linux Guide
- Prerequisites
- Download binary package
- Unzip binary package
- Setup environment variables
- Setup Hive HDFS folders
- guava lib issue
- Install MySQL server
- Configure Hive metastore
- Download MySQL JDBC driver
- Configure Hive file hive-site.xml
- Initialize database structure
- Verify the schema in mysql CLI
- Configure Hive API authentication
- Start HiveServer2 service
- Practices
Prerequisites
Install WSL in a system or non-system drive on your Windows 10 and then install Hadoop 3.3.0 on it:
Now let’s start to install Apache Hive 3.1.2 on WSL.
Download binary package
Select a package from the download page:
https://hive.apache.org/downloads.html
For me, the recommended location is: https://www.strategylions.com.au/mirror/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz.
In WSL bash terminal, run the following command to download the package:
wget https://www.strategylions.com.au/mirror/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Unzip binary package
If you have configured Hadoop 3.3.0 successfully by following Kontext guide (in prerequisites section), there should be one folder named hadoop existing in your home folder already:
$ ls -l total 3716664 -rw-r--r-- 1 tangr tangr 278813748 Jul 3 14:35 apache-hive-3.1.2-bin.tar.gz drwxrwxrwx 1 tangr tangr 4096 May 16 2019 dfs drwxrwxrwx 1 tangr tangr 4096 Dec 27 23:02 hadoop
Now unzip Hive package using the following command:
tar -xvzf apache-hive-3.1.2-bin.tar.gz -C ~/hadoop
In the hadoop folder there are now two subfolders at least (one for Hadoop and another for Hive):
$ ls ~/hadoop apache-hive-3.1.2-bin hadoop-3.3.0
Setup environment variables
In the prerequisites sections, we’ve already configured some environment variables like the following:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64 export HADOOP_HOME=/home/tangr/hadoop/hadoop-3.3.0 export PATH=$PATH:$HADOOP_HOME/bin
*Note: your user name can be different.
Let’s run the following command to add Hive required environment variables into .bashrc file too:
vi ~/.bashrc
Add the following lines to the end of the file:
export HIVE_HOME=/home/tangr/hadoop/apache-hive-3.1.2-bin
export PATH=$HIVE_HOME/bin:$PATH
* Change the highlighted user name to your own one.
Run the following command to source the variables:
source ~/.bashrc
Verify the environment variables:
echo $HIVE_HOME /home/tangr/hadoop/apache-hive-3.1.2-bin
Setup Hive HDFS folders
Start your Hadoop services (if you have not done that) by running the following command:
$HADOOP_HOME/sbin/start-all.sh
In WSL, you may need to restart you ssh services if ssh doesn’t work:
localhost: ssh: connect to host localhost port 22: Connection refused
To restart the services, run the following command:
sudo service ssh restart
Run the following command (jps) to make sure all the services are running successfully.
$ jps 2306 NameNode 2786 SecondaryNameNode 3235 NodeManager 3577 Jps 2491 DataNode 3039 ResourceManager
As you can see, all the services are running successfully in my WSL.
Now let’s setup the HDFS folders for Hive.
Run the following commands:
hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
Or alternatively, directly run the following command:
$HIVE_HOME/bin/init-hive-dfs.sh
guava lib issue
To avoid issues like the one described on page: Hive: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V, let's ensure guava library version is consistent between Hive and Hadoop.
- Hadoop 3.3.0 -
$ ls $HADOOP_HOME/share/hadoop/common/lib | grep guava guava-27.0-jre.jar
- Hive 3.1.2 -
$ ls $HIVE_HOME/lib | grep guava guava-19.0.jar jersey-guava-2.25.1.jar
- Copy newer version from Hadoop to Hive:
rm $HIVE_HOME/lib/guava-19.0.jar cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE_HOME/lib/
Install MySQL server
1) Install MySQL server
Microsoft has published very detailed tutorials about how to install MySQL on WSL: Get started using MySQL MongoDB, PostgreSQL, SQLite, Microsoft SQL Server, or Redis to set up a database on Windows Subsystem for Linux | Microsoft Docs. Please follow the steps to configure a MySQL instance on WSL or your UNIX-alike systems.
If you encounter errors like this: Cannot open /proc/net/unix: No such file or directory, please close your WSL and then start service using the following command:
sudo /etc/init.d/mysql start * Starting MySQL database server mysqld No directory, logging in with HOME=/ [ OK ]
*Try also to restart your Windows system too if you are installing on a WSL Linux sub-system.
Make sure you can enter mysql CLI (sudo mysql):
$ sudo mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.32-0ubuntu0.18.04.1 (Ubuntu) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
2) Create Hive metastore database and user
Run the following commands in mysql CLI:
sudo mysql mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive' password expire never; mysql> GRANT ALL ON *.* TO 'hive'@'localhost'; mysql> exit; mysql -u hive -phive mysql> create database hive_metastore; Query OK, 1 row affected (0.00 sec) mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | hive_metastore | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.00 sec)
After these steps, the following database objects are created:
- user: hive
- password: hive
- database: hive_metastore
These details are required in the following steps to create Hive metastore.
Configure Hive metastore
Follow these steps to configure a remote Hive metastore. These steps are similar as the SQL Server configuration guide: Configure a SQL Server Database as Remote Hive Metastore.
Download MySQL JDBC driver
First let's find out MySQL version to download the compatible JDBC driver:
mysql -V mysql Ver 14.14 Distrib 5.7.32, for Linux (x86_64) using EditLine wrapper
Download MySQL JDBC driver from official website MySQL :: Download Connector/J and choose the right version for your Linux/WSL system.
Alternatively, we can directly download it from Maven central.
For my WSL, I am using the following version (8.0.22):
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.22/mysql-connector-java-8.0.22.jar
Move the downloaded JAR file to Hive lib folder:
mv mysql-connector-java-8.0.22.jar $HIVE_HOME/lib/
Configure Hive file hive-site.xml
Edit hive-default.xml file (or create one if not existing) in folder $HIVE_HOME/conf.
* You can create the configuration file by using template file hive-default.xml.template.
cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
The above command needs to run in Cygwin terminal.
We need configure the following properties accordingly:
- javax.jdo.option.ConnectionDriverName: com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver depends on the version of your MySQL JDBC driver.
- javax.jdo.option.ConnectionURL: jdbc:mysql://localhost/hive_metastore
- javax.jdo.option.ConnectionUserName: hive
- javax.jdo.option.ConnectionPassword: hive
- hive.metastore.uris: thrift://127.0.0.1:9083
- hive.metastore.db.type: mysql
The highlighted values are from our configurations in MySQL Server installation step.
For my environment, these items are configured as the following:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> <description> Should metastore do authorization against database notification related APIs such as get_next_notification. If set to true, then only the superusers in proxy settings have the permission </description> </property> <property> <name>hive.metastore.db.type</name> <value>mysql</value> <description> Expects one of [derby, oracle, mysql, mssql, postgres]. Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it. </description> </property> <property> <name>hive.metastore.uris</name> <value>thrift://127.0.0.1:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.ds.connection.url.hook</name> <value/> <description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hive_metastore</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> </configuration>
Initialize database structure
Now we need to run schematool to setup metastore for Hive. The command syntax looks like the following:
$HIVE_HOME/bin/schematool -dbType <db type> -initSchema
For argument dbType, it can be any of the following values:
derby|mysql|postgres|oracle|mssql
By default, Apache Derby will be used. However it is a standalone database and can only be used for one connection concurrently.
Let's now run the following command in Bash to initiate database schema:
$HIVE_HOME/bin/schematool -dbType mysql -initSchema
When I was initializing the schema on WSL system, I got the following error:
Metastore connection URL: jdbc:mysql://localhost/hive_metastore Metastore Connection Driver : com.mysql.cj.jdbc.Driver Metastore connection User: hive org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. Underlying cause: java.sql.SQLException : The server time zone value 'AEDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the 'serverTimezone' configuration property) to use a more specific time zone value if you want to utilize time zone support. SQL Error code: 0 Use --verbose for detailed stacktrace. *** schemaTool failed ***
This is a new error I didn't encounter before when configuring SQL Server database as Hive metastore.
To fix this issue, refer to MySQL: The server time zone value 'AEDT' is unrecognized or represents more than one time zone. We just need to update JDBC connection string accordingly:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hive_metastore?serverTimezone=UTC</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property>
... Initialization script completed schemaTool completed
Verify the schema in mysql CLI
Double confirm the schema using mysql CLI:
mysql> use hive_metastore; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +-------------------------------+ | Tables_in_hive_metastore | +-------------------------------+ | AUX_TABLE | | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_COMPACTIONS | | COMPLETED_TXN_COMPONENTS | | CTLGS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | I_SCHEMA | | KEY_CONSTRAINTS | | MASTER_KEYS | | MATERIALIZATION_REBUILD_LOCKS | | METASTORE_DB_PROPERTIES | | MIN_HISTORY_LEVEL | | MV_CREATION_METADATA | | MV_TABLES_USED | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NEXT_WRITE_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | REPL_TXN_MAP | | ROLES | | ROLE_MAP | | RUNTIME_STATS | | SCHEMA_VERSION | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TXN_TO_WRITE_ID | | TYPES | | TYPE_FIELDS | | VERSION | | WM_MAPPING | | WM_POOL | | WM_POOL_TO_TRIGGER | | WM_RESOURCEPLAN | | WM_TRIGGER | | WRITE_SET | +-------------------------------+ 74 rows in set (0.00 sec) mysql>
All these tables in Hive metastore schema are created successfully.
Configure Hive API authentication
Add the following section to $HIVE_HOME/conf/hive-site.xml file:
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
And then update Hadoop core-site.xml configuration file to add the following configurations:
<property>
<name>hadoop.proxyuser.tangr.hosts</name>
<value>*</value> </property>
<property>
<name>hadoop.proxyuser.tangr.groups</name>
<value>*</value> </property>
* Replace the highlighted user name to your own user name.
Now all the configurations are done and we are ready to start Hive services. Make sure Hadoop services are running before kicking off Hive services.
Start HiveServer2 service
Run the command below to start the HiveServer2 service:
$HIVE_HOME/bin/hive --service metastore &
$HIVE_HOME/bin/hive --service hiveserver2 &
Wait until you can open HiveServer2 Web UI: http://localhost:10002/.
Practices
You can follow section ‘DDL practices’ in my previous post to test your Hive data warehouse.
Apache Hive 3.0.0 Installation on Windows 10 Step by Step Guide
I Cant Insert rows into my table. When I do the insert statement I get the following error:
Error while compiling statement: Failed:SemanticException 0:0 Expected 2 columns for insclause-0
Can you provide your table creation statement and also the insert statement?
Help!. I am new to this and I followed the directions .
I can't get the Hive site to come up.
Hi David,
Sometimes it will take a few minutes for the services to start. Also please rerun the Hive services start commands.
You can also check HS2 logs to see if the daemons start successfully and the reasons if not. One common problem is that Hadoop services didn't start correctly or HS2 cannot connect to Hadoop.
By default, the logs are stored in system temporary folder under your user name. For example, /tmp/$USER
property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
property.hive.log.file = hive.log
Hello
I followed this tutorial and I cannot access my hivestore2 UI
I also get multiple hive sessions on the WSL environment
Have you configured MySQL as metastore?
And also can you please share the output from the following command:
jps
Hello, thank you for such a good article.
But when I executed the command $HIVE_HOME/bin/schematool -dbType mysql -initSchema, I've got the following error:
SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/yaroslav/Hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/yaroslav/Hadoop/hadoop-3.3.2/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql
Error: Syntax error: Encountered "<EOF>" at line 1, column 64. (state=42X01,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
I did everything according to the instructions, but hive tries to create derby metastore instead of MySQL. I will appreciate your help so much.