Apache Hive 3.1.2 Installation on Windows 10

Raymond Raymond event 2020-08-10 visibility 12,292 comment 34
more_vert
Apache Hive 3.1.2 Installation on Windows 10

Hive 3.1.2 was released on 26th Aug 2019. It is still the latest 3.x release and works with Hadoop 3.x.y releases. In this article, I’m going to provide step by step instructions about installing Hive 3.1.2 on Windows 10.

warning Alert - Apache Hive is impacted by Log4j vulnerabilities; refer to page Apache Log4j Security Vulnerabilities to find out the fixes.

Prerequisites

Before installation of Apache Hive, please ensure you have Hadoop available on your Windows environment. We cannot run Hive without Hadoop. 

Install Hadoop (mandatory)

I recommend to install Hadoop 3.3.0 to work with Hive 3.1.2 though any Hadoop 3.x version will work.

There are several articles I've published so far and you can follow one of them to install Hadoop in your Windows 10 machine:

Tools and Environment

  • Windows 10
  • Cygwin
  • Command Prompt

Install Cygwin

Please install Cygwin so that we can run Linux shell scripts on Windows. From Hive 2.3.0, the binary doesn’t include any CMD file anymore. Thus you have to use Cygwin or any other bash/sh compatible tools to run the scripts.

You can install Cygwin from this site: https://www.cygwin.com/.

Download binary package

Download the latest binary from the official website:

https://hive.apache.org/downloads.html

For my location, the closest download is available at http://apache.mirror.serversaustralia.com.au/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz.

Save the downloaded package to a local drive. I am saving to ‘F:\big-data’. This path will be referenced in the instructions below. Please remember to replace it accordingly if you are saving to a different path.

If you cannot find the package, you can download from the archive site too: https://archive.apache.org/dist/hive/hive-3.1.2/.

Unpack the binary package

Open Cygwin terminal, and change directory (cd) to the folder where you save the binary package and then unzip:

cd F:\big-data
tar -xvzf apache-hive-3.1.2-bin.tar.gz

The binaries are unzipped to path: F:\big-data\apache-hive-3.1.2-bin.

Setup environment variables

Run the following commands in Cygwin to setup the environment variables:

export HADOOP_HOME='/cygdrive/f/big-data/hadoop-3.3.0'
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME='/cygdrive/f/big-data/apache-hive-3.1.2-bin'
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$(hadoop classpath)
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar

You can add these exports to file .bashrc so that you don’t need to run these command manually each time when you launch Cygwin:

vi ~/.bashrc

And then add the above lines into the file. If your Hadoop or Hive paths are different, please change them accordingly. 

Run the following command to source the environment variables.

source ~/.bashrc

Start Hadoop daemon services

If you have not started Hadoop services yet, run the following commands in Command Prompt (Run as Administrator) window:

%HADOOP_HOME%\sbin\start-dfs.cmd
%HADOOP_HOME%\sbin\start-yarn.cmd

You should be able to see the following services via running jps command in Command Prompt:

jps
13024 NodeManager
18176 NameNode
10908 DataNode
1324 Jps
6284 ResourceManager

Setup Hive HDFS folders

Open Command Prompt and then run the following commands:

hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w   /tmp
hadoop fs -chmod g+w   /user/hive/warehouse

These commands will setup HDFS folders for Hive data warehousing. 

Java doesn’t understand Cygwin path properly. To avoid errors like the following, we need to add some symbolic links:

JAR does not exist or is not a normal file: F:\cygdrive\f\big-data\apache-hive-3.1.2-bin\lib\hive-beeline-3.1.2.jar

In my system, Hive is installed in F:\big-data\ folder. To make it work, follow these steps:

  • Create a folder in F: driver named cygdrive
  • Open Command Prompt (Run as Administrator) and then run the following command:
C:\WINDOWS\system32>mklink /J  F:\cygdrive\f\ F:\
Junction created for F:\cygdrive\f\ <<===>> F:\

In this way, ‘F:\cygdrive\f’ will be equivalent to ‘F:\’.  You need to change the drive to the appropriate drive where you are installing Hive. For example, if you are installing Hive in C driver, the command line will be:

C:\WINDOWS\system32>mklink /J  C:\cygdrive\c\ C:\

Initialize metastore

Now we need to initialize the schemas for metastore. The command syntax looks like the following:

$HIVE_HOME/bin/schematool -dbType <db type> -initSchema

Type the following command to view all the options:

$HIVE_HOME/bin/schematool -help

For argument dbType, the value can be one of the following databases:

derby|mysql|postgres|oracle|mssql

For this article, I am going to use derby as it is purely Java based and also already built-in with the Hive release:

$HIVE_HOME/bin/schematool -dbType derby -initSchema
infoIn some online guidance, they suggest to download Derby jar files. This is not required as they are already included in Hive release binary package. 

The output looks similar to the following:

2020081083700-image.png

Ensure you can see the log 'schemaTool completed'.

A folder named metastore_db will be created on your current path (pwd). For my environment, it is F:\big-data\metastore_db.

Configure a remote database as metastore

This step is optional for this article. You can configure it to support multiple sessions. 

Please refer to this post about configuring SQL Server database as metastore.

Configure a SQL Server Database as Remote Hive Metastore

For this article, I am going to just use the Derby file system database. 

Configure API authentication

Let's now add some configurations. All configuration files for Hive are stored in conf folder of HIVE_HOME folder.  

1) Create a configuration file named hive-site.xml using the following command in Cygwin:

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml

2) Open hive-site.xml file and remove all properties elements under root element 'configuration'.

3) Add the following configuration into hive-site.xml file.

<property>
    <name>hive.metastore.event.db.notification.api.auth</name>
     <value>false</value>
     <description>
       Should metastore do authorization against database notification related APIs such as get_next_notification.
       If set to true, then only the superusers in proxy settings have the permission
     </description>
   </property>

The content of the file looks like the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>
  <!-- Hive Execution Parameters -->
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
     <value>false</value>
     <description>
       Should metastore do authorization against database notification related APIs such as get_next_notification.
       If set to true, then only the superusers in proxy settings have the permission
     </description>
   </property>
</configuration>

Alternatively you can configure proxy user in Hadoop core-site.xml configuration file. Refer to the following post for more details:

HiveServer2 Cannot Connect to Hive Metastore Resolutions/Workarounds

Start HiveServer2 service

Run the following command in Cygwin to start HiveServer2 service:

$HIVE_HOME/bin/hive --service metastore & 
$HIVE_HOME/bin/hive --service hiveserver2 start &

Leave the Cygwin terminal open so that the service keeps running and you can open another Cygwin terminal to run beeline or hive commands. If you choose to use the following approach to start the service, press Ctrl + C to cancel this current one.

Run CLI directly

You can also run the CLI either via hive or beeline command.

$HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT
$HIVE_HOME/bin/hive

Replace $HS2_HOST with HiveServer2 address and $HS2_PORT with HiveServer2 port.

By default the URL is: jdbc:hive2://localhost:10000.

Verify Hive installation

Now we have Hive installed successfully, we can run some SQL commands to verify.

For more details about the commands, refer to official website:

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Create a new Hive database

Run the following command in Beeline to create a database named test_db:

 create database if not exists test_db;

As I didn’t specify the database location, it will be created under the default HDFS location: /user/hive/warehouse.

In HDFS name node, we can see a new folder is created as the following screenshot shows:

20200810100009-image.png

Create a new Hive table

Run the following commands to create a table named test_table:

use test_db;
create table test_table (id bigint not null, value varchar(100));
show tables;

Insert data into Hive table

Run the following command to insert some sample data:

insert into test_table (id,value) values (1,'ABC'),(2,'DEF');

Two records will be created by the above command.

The command will submit a MapReduce job to YARN. You can also configure Hive to use Spark as execution engine instead of MapReduce.

You can track the job status through Tracking URL printed out by the console output.

Go to YARN, you can also view the job status:

20200810100212-image.png

Wait until the job is completed. 

Select data from Hive table

Now, you can display the data by running the following command in Beeline:

select * from test_table;

The output looks similar to the following:

+----------------+-------------------+
| test_table.id  | test_table.value  |
+----------------+-------------------+
| 1              | ABC               |
| 2              | DEF               |
+----------------+-------------------+

In Hadoop NameNode website, you can also find the new files are created:

20200810100700-image.png

check Congratulations! You have successfully installed Hive 3.1.2 on your Windows 10 system.
More from Kontext
comment Comments
PK PRAVEEN KUMAR B

PRAVEEN KUMAR access_time 2 years ago link more_vert

Yes. I have followed your steps completely and the folder also has the permission too. But it’s not executing. 

Raymond Raymond

Raymond access_time 2 years ago link more_vert

This is a strange error. Can you try using the one with .sh?

$HIVE_HOME/bin/schematool.sh
PK PRAVEEN KUMAR B

PRAVEEN KUMAR access_time 2 years ago link more_vert

After running, am getting the below text,


$ cat "/cygdrive/c/hadoop-3.2.1/hive-3.1.2/bin/schematool.sh"

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at


# http://www.apache.org/licenses/LICENSE-2.0


# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

bin=`dirname "$0"`

bin=`cd "$bin"; pwd`

. "$bin"/hive --service schemaTool "$@"


Again running with schematool, SAME ERROR

$ HIVE_HOME/bin/schematool.sh -dbType derby -initSchema

-bash: HIVE_HOME/bin/schematool.sh: No such file or directory



Raymond Raymond

Raymond access_time 2 years ago link more_vert

Can I confirm that you are using the following one:

$HIVE_HOME/bin/schematool.sh -dbType derby -initSchema-bash: HIVE_HOME/bin/schematool.sh: No such file or directory

instead of:

$ HIVE_HOME/bin/schematool.sh -dbType derby -initSchema-bash: HIVE_HOME/bin/schematool.sh: No such file or directory

There is a space in your error log.

PK PRAVEEN KUMAR B

PRAVEEN KUMAR access_time 2 years ago link more_vert
$HIVE_HOME/bin/schematool -dbType derby -initSchema

This step is not working for me. Getting an error like 

-bash: schematool: command not found

Kindly help to resolve it 
Raymond Raymond

Raymond access_time 2 years ago link more_vert

Did you run this in Git Bash or Cygwin terminal? You need to ensure the HIVE_HOME variable is also setup correctly.


Can you run the following?

echo "$HIVE_HOME/bin/schematool"
cat "$HIVE_HOME/bin/schematool"
PK PRAVEEN KUMAR B

PRAVEEN KUMAR access_time 2 years ago link more_vert

cd $HIVE_HOME

/cygdrive/c/hadoop-3.2.1/hive-3.1.2


$ echo "/cygdrive/c/hadoop-3.2.1/hive-3.1.2/bin/schematool"

/cygdrive/c/hadoop-3.2.1/hive-3.1.2/bin/schematool


$ cat "/cygdrive/c/hadoop-3.2.1/hive-3.1.2/bin/schematool"

cat: /cygdrive/c/hadoop-3.2.1/hive-3.1.2/bin/schematool: No such file or directory


Hive home is set correctly

Echo is working fine

Cat is displaying like : No such file or directory


20230329125604-image.png


In this attached image I have all formats of schematool (like textfile, command and .sh)


Kindly suggest to resolve it. While running hive scripts metastore_db is also created. 

Raymond Raymond

Raymond access_time 2 years ago link more_vert

Can you check whether your running user has permissions to the folder?  

Installing on Windows directly is not an easy task and you need to make sure every step is correct. I suggest following this guide to install it in WSL instead:

Apache Hive 3.1.2 Installation on Linux Guide (kontext.tech)

O Orland Espiritu

Orland access_time 4 years ago link more_vert

Hi again Raymond is there anyway I can get in touch with you through chat since Im in need of mentoring in Hive and Spark for some exercises. I find this topic quite challenging. 


Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Orland, I may be able to find sometime this week (after my work hours or on weekend).

Can you please send an email about the timezone and preferred time to the following email box?

Contact us 

I will try to organize one session with you. 


O Orland Espiritu

Orland access_time 4 years ago link more_vert

Hi when I run the vi ~/.bashrc cygwin gets stuck and when I scroll down letters appear.

Raymond Raymond

Raymond access_time 4 years ago link more_vert

For Cygwin installation itself, please refer to its official documentation.

About editing user profile via Cygwin terminal, you should see something like the following screenshot where then you need to press Insert to start inserting new lines. If you re not familiar with vi editor and its command, refer to An introduction to the vi editor

Install Hive natively on Windows is not easy and you need to follow exactly all the steps I mentioned in the article. I would suggest you just use our WSL guide to install it: Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux

O Orland Espiritu

Orland access_time 4 years ago link more_vert

Hi do you have a guide for cygwin installation for Hive?

R R4F43L HU3

R4F43L access_time 4 years ago link more_vert

Hi, thanks a lot for this howto !
I have an error when i start hiveserver2 can someone help me please ? 

 WARN  [main] server.HiveServer2 (HiveServer2.java:startHiveServer2(1064)) - Error starting HiveServer2 on attempt 1, will retry in 60000ms
java.lang.RuntimeException: Error applying authorization policy on hive configuration: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
        at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
        at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
        at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
        at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
        at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
        at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:932)
        at org.apache.hadoop.hive.ql.session.SessionState.applyAuthorizationPolicy(SessionState.java:1893)
        at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:131)
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:115)
        ... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:964)
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:924)
        ... 15 more

My conf :




Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi,

From the screenshot, I could not identify any issues.

Are you using derby as metastore or external database like SQL Server or MySQL as metastore? 

For Derby, it only allows maximum one concurrent session. If you are using Derby, can I suggest you to change that to a remote metastore?

For example, Configure a SQL Server Database as Remote Hive Metastore - Kontext.

And also, assuming you already created the symbolic link as mentioned in the article?


R R4F43L HU3

R4F43L access_time 4 years ago link more_vert

Hi Raymond,

In your tutorial i use the derby embeded so i try to use and external derby following this tutorual "Installing Apache Hive 3.1.2 on Windows 10 - https://towardsdatascience.com/".

And it works fine but when i stop the hive server killing the process sometimes i have the same error so i remove metastore_db and with cygwin i do : 

$HIVE_HOME/bin/schematool -dbType derby -initSchema

But all hive query are very slow... maybe it caused by the limitation you mention ". So i will try to change the db of the metastore maybe i will try to do with sql server running in Docker following this tutotorial how-to-run-sql-server-in-a-docker-container.

You think hive query can be faster if we change engin MR to Tez, you have a step-by-step to adapt it to tez ?

I share with you some issues i found during my test, in my case i want to write parquet files in hdfs then create hive external tables to query data and insert it on an internal hive table where i also need to delete data (this table must be transactional)

Error: User: MYUSERNAME is not allowed to impersonate MYUSERNAME

I fix it by following : stackoverflow - 43180305

java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration

I fix it by following : stackoverflow.com - 29602670

Here the properties i added to fix my issues :




Raymond Raymond

Raymond access_time 4 years ago link more_vert

I recommend using SQL Server as metastore. Each time when you init the metastore, you lose all the Hive metadata like databases, tables, etc.  

N Naseemuddin Khan

Naseemuddin access_time 4 years ago link more_vert

Hi Raymond, I was able to follow your instructions until the creation of the table. The insert step, however, does not seem to be working. It gets stuck with this output at the end

Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
2021-03-08 21:46:27,779 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
2021-03-08 21:46:27,780 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
Raymond Raymond

Raymond access_time 4 years ago link more_vert

Did you configure a remote metastore for Hive? From the logs, it seems you are using derby as metastore? I recommend using SQL Server, MySQL or PostgreSQL as metadata store.

N Naseemuddin Khan

Naseemuddin access_time 4 years ago link more_vert

Also, I have been trying to use Spark on Hive tables. Here I get this output:

>>> spark.sql("show databases")
21/03/08 21:51:09 WARN NativeIO: NativeIO.getStat error (3): Das System kann den angegebenen Pfad nicht finden.
-- file path: tmp/hive
21/03/08 21:51:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/03/08 21:51:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/03/08 21:51:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/03/08 21:51:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.178.60
21/03/08 21:51:13 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
DataFrame[namespace: string]

Can you help me with that?

R Renganathan Mutthiah

Renganathan access_time 4 years ago link more_vert

I have just tried with Hadoop 3.3.0 and I am getting the same error while I use the schematool. All the hadoop processes are running fine (they were running fine even in 2.9.1). Somewhere the HIVE setup is picking up my partial user name (the second word is picked; first & second words are separated by space) and class not found error is thrown.

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Renganathan,

Can you please try the following actions?

  1. Change Hadoop 3.3.0 environment variables to the Cygwin version as documented in this article? In the output you pasted earlier, it is still using Windows path:
    export HADOOP_HOME='/cygdrive/f/big-data/hadoop-3.3.0'
    export PATH=$PATH:$HADOOP_HOME/bin
    export HIVE_HOME='/cygdrive/f/big-data/apache-hive-3.1.2-bin'

    *Remember to change the path to your own ones

  2. Make sure you can run these commands successfully with expected output:
    ls /cygdrive/f/big-data/hadoop-3.3.0
    ls /cygdrive/f/big-data/apache-hive-3.1.2-bin

    *Remember to change the path to your own ones.

  3. Examine all your Hadoop paths and configurations that there is no space in any path include DFS path in Hadoop configurations.

If you follow exactly all my steps in Hadoop 3.3.0 and Hive 3.1.2 setup, there should be no issues - I've tested it. 

BTW, to answer one of your previous question, the script locates at: apache-hive-3.1.2-bin\bin\ext\schemaTool.sh.

-Raymond

R Renganathan Mutthiah

Renganathan access_time 4 years ago link more_vert

Hi, thanks for the article.

When I run the command "$HIVE_HOME/bin/schematool -dbType derby -initSchema", I am getting the error .. Error: Could not find or load main class ???. (??? - is my user name).

Can you please help how to resolve it.

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi, did you follow all the steps exactly? As we have to use Cygwin to run the commands since the Command Prompt script version is not available for the latest Hive, all the steps I included in the guide is critical. It looks like the script cannot find your derby JAR files due to Java classpath (Hadoop/Hive environment variables are not setup correctly). The folder where you run the schema init script is also important to use derby since it is a file based database.

Considering derby is not good for concurrency for Hive connections, I suggest to use a remote metastore like SQL Server: Configure a SQL Server Database as Remote Hive Metastore

Alternatively, to save all the troubles, I highly recommend you follow my newly publish article Apache Hive 3.1.2 Installation on Linux Guide. That article will configure Hadoop 3.3.0 and Hive 3.1.2 in a WSL environment on Windows 10. It also includes steps to install MySQL as remote metastore for your Hive data warehouse. 

Let me know if you encounter errors in WSL.

-Raymond

R Renganathan Mutthiah

Renganathan access_time 4 years ago link more_vert

Hi Raymond,

Thanks for looking into this. I see that I have followed all the steps. I do have derby JAR files under apache-hive-3.1.2-bin\lib. I use Cygwin to setup the environment variables and run the schematool in it.

I started debugging the shell scripts. The class not found occurs in hive.ksh (located in the bin folder) at the last line --> $TORUN "$@"

It seems that TORUN resolves to schemaTool (capital 'T'). 

I am not sure how to proceed further to identify the issue root cause.

Thanks!

Raymond Raymond

Raymond access_time 4 years ago link more_vert

Hi Renganathan,

For all these environment variables setup, did you add all of them into ~/.bashrc and run command 'source ~/.bashrc'?

export HADOOP_HOME='/cygdrive/f/big-data/hadoop-3.3.0'
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME='/cygdrive/f/big-data/apache-hive-3.1.2-bin'
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$(hadoop classpath)
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*.jar

Can you also run the following command line in cygwin and paste the output here?

echo $HADOOP_CLASSPATH

And also can you paste all your detailed error here if it is ok?

The schema tool requires the following JAVA class to be present and its JAR file needs to be in the Java classpath. 

schemaTool() {
  HIVE_OPTS=''
  CLASS=org.apache.hive.beeline.HiveSchemaTool
  execHiveCmd $CLASS "$@"
}

schemaTool_help () {
  HIVE_OPTS=''
  CLASS=org.apache.hive.beeline.HiveSchemaTool
  execHiveCmd $CLASS "--help"
}

R Renganathan Mutthiah

Renganathan access_time 4 years ago link more_vert

Sure, thanks for your help Raymond!

Below is the output of the classpath you requested.

$ echo $HADOOP_CLASSPATH

E:\lion\Hadoop\hadoop-2.9.1\contrib\capacity-scheduler\*.jar;E:\Lion\Hadoop\hadoop-2.9.1\etc\hadoop;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\common\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\common\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\mapreduce\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\mapreduce\*:/cygdrive/e/lion/Hadoop/apache-hive-3.1.2-bin/lib/*.jar

And you mentioned about the function schemaTool(). I am unable to find the function in my hive.sh script. Not sure where it is located.

The error I am getting after I submit the command is:

$HIVE_HOME/bin/schematool -dbType derby -initSchema

Error: Could not find or load main class Lion

And I am setting up all the environment variables in my crygwin before executing the schematool command.

Please let me know if you need any further details.

Thanks!

Administrator Administrator

Administrator access_time 4 years ago link more_vert

Your Hadoop version is 2.9.1 while in the tutorial it is tested with Hadoop 3.3.0. Can you please use the same Hadoop version? For different versions, some libraries may conflict with each other and Hive 3.1.2 works with Hadoop 3.x.y but not Hadoop 2.x.

A Ankit Tiwari

Ankit access_time 5 years ago link more_vert

Hi Raymond,

Thanks for sharing the links for installation of additional Hadoop softwares.

I started with installing Hive.

When I run the command $HIVE_HOME/bin/hive --service metastore &

the command keeps running and I do not get the command prompt back.

So I opened another Cygwin terminal to run the below command.

$HIVE_HOME/bin/hive --service hiveserver2 start &

Then I opened a 3rd Cygwin terminal to run the commands 

$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000

$HIVE_HOME/bin/hive

and got the below message.

Could you please guide me if I need to change it ?


Raymond Raymond

Raymond access_time 5 years ago link more_vert

Hi Ankit,

If you are using default derby as metastore, you cannot open multiple sessions. When you run the command yes you can still see logs but you can press enter so that you can input commands without starting a new session. 

If you prefer to allow multiple sessions, please follow the link to use a remote metastore like SQL Server. 

A Ankit Tiwari

Ankit access_time 5 years ago link more_vert

Hi Raymond,

I faced issues during Hive installation, so did the complete process again and found that Schematool step has some issues.

Please see the below screen-shot. It says that "Found configuration file null" conf.Hiveconf.

I guess that's why my metastore step has issues.

There are a lot of blank lines in the schematool log.

If you know what can be done to fix this, could you please help me?


Raymond Raymond

Raymond access_time 5 years ago link more_vert

For some of my previous configurations, the logs were not printed out either.

Can you please check whether there is a folder named metastore_db created? It should be ok if the folder exists and have content.

To make everything easy, I would suggest to install SQL Server express or Developer edition and then configure your Hive metastore to use SQL Server:

https://kontext.tech/column/hadoop/302/configure-a-sql-server-database-as-remote-hive-metastore 

If it still doesn't work, please send an email to enquiry[at]kontext.tech and I can arrange a Teams meeting with you to debug. 

A Ankit Tiwari

Ankit access_time 5 years ago link more_vert

Hi Raymond,

Thank you so much for your help.

I will first try to find out the issue on my own. If it doesn't work, would send email to enquiry mailbox.

I installed Hadoop 3.3.0 on my system and then directly tried to install Hive 3.1.2.

Could you please confirm if that sequence is fine or I need to install something in between Hadoop and hive?

Thanks again :)

Regards,

Ankit

Raymond Raymond

Raymond access_time 5 years ago link more_vert

The sequence is fine. Hadoop needs to be installed first before you install Hive as Hive utilizes HDFS for data store in a on-premise cluster.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts