access_time 9 months ago languageEnglish
more_vert

Apache Hive 3.1.2 Installation on Windows 10

visibility 2,886 comment 21
Hive 3.1.2 was released on 26th Aug 2019. It is still the latest 3.x release and works with Hadoop 3.x.y releases. In this article, I’m going to provide step by step instructions about installing Hive 3.1.2 on Windows 10. * Logos are registered trademarks of Apache Hive and Microsoft Windows.
info Last modified by Raymond 3 months ago
thumb_up 2

comment Comments

2 months ago link more_vert

I recommend using SQL Server as metastore. Each time when you init the metastore, you lose all the Hive metadata like databases, tables, etc.  

format_quote

person R4F43L access_time 2 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

Hi Raymond,

In your tutorial i use the derby embeded so i try to use and external derby following this tutorual "Installing Apache Hive 3.1.2 on Windows 10 - https://towardsdatascience.com/".

And it works fine but when i stop the hive server killing the process sometimes i have the same error so i remove metastore_db and with cygwin i do : 

$HIVE_HOME/bin/schematool -dbType derby -initSchema

But all hive query are very slow... maybe it caused by the limitation you mention ". So i will try to change the db of the metastore maybe i will try to do with sql server running in Docker following this tutotorial how-to-run-sql-server-in-a-docker-container.

You think hive query can be faster if we change engin MR to Tez, you have a step-by-step to adapt it to tez ?

I share with you some issues i found during my test, in my case i want to write parquet files in hdfs then create hive external tables to query data and insert it on an internal hive table where i also need to delete data (this table must be transactional)

Error: User: MYUSERNAME is not allowed to impersonate MYUSERNAME

I fix it by following : stackoverflow - 43180305

java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration

I fix it by following : stackoverflow.com - 29602670

Here the properties i added to fix my issues :




2 months ago link more_vert

Hi Raymond,

In your tutorial i use the derby embeded so i try to use and external derby following this tutorual "Installing Apache Hive 3.1.2 on Windows 10 - https://towardsdatascience.com/".

And it works fine but when i stop the hive server killing the process sometimes i have the same error so i remove metastore_db and with cygwin i do : 

$HIVE_HOME/bin/schematool -dbType derby -initSchema

But all hive query are very slow... maybe it caused by the limitation you mention ". So i will try to change the db of the metastore maybe i will try to do with sql server running in Docker following this tutotorial how-to-run-sql-server-in-a-docker-container.

You think hive query can be faster if we change engin MR to Tez, you have a step-by-step to adapt it to tez ?

I share with you some issues i found during my test, in my case i want to write parquet files in hdfs then create hive external tables to query data and insert it on an internal hive table where i also need to delete data (this table must be transactional)

Error: User: MYUSERNAME is not allowed to impersonate MYUSERNAME

I fix it by following : stackoverflow - 43180305

java.lang.NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration

I fix it by following : stackoverflow.com - 29602670

Here the properties i added to fix my issues :




format_quote

person Raymond access_time 2 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

Hi,

From the screenshot, I could not identify any issues.

Are you using derby as metastore or external database like SQL Server or MySQL as metastore? 

For Derby, it only allows maximum one concurrent session. If you are using Derby, can I suggest you to change that to a remote metastore?

For example, Configure a SQL Server Database as Remote Hive Metastore - Kontext.

And also, assuming you already created the symbolic link as mentioned in the article?


2 months ago link more_vert

Hi,

From the screenshot, I could not identify any issues.

Are you using derby as metastore or external database like SQL Server or MySQL as metastore? 

For Derby, it only allows maximum one concurrent session. If you are using Derby, can I suggest you to change that to a remote metastore?

For example, Configure a SQL Server Database as Remote Hive Metastore - Kontext.

And also, assuming you already created the symbolic link as mentioned in the article?


format_quote

person R4F43L access_time 2 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

Hi, thanks a lot for this howto !
I have an error when i start hiveserver2 can someone help me please ? 

 WARN  [main] server.HiveServer2 (HiveServer2.java:startHiveServer2(1064)) - Error starting HiveServer2 on attempt 1, will retry in 60000ms
java.lang.RuntimeException: Error applying authorization policy on hive configuration: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
        at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
        at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
        at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
        at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
        at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
        at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:932)
        at org.apache.hadoop.hive.ql.session.SessionState.applyAuthorizationPolicy(SessionState.java:1893)
        at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:131)
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:115)
        ... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:964)
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:924)
        ... 15 more

My conf :




2 months ago link more_vert

Hi, thanks a lot for this howto !
I have an error when i start hiveserver2 can someone help me please ? 

 WARN  [main] server.HiveServer2 (HiveServer2.java:startHiveServer2(1064)) - Error starting HiveServer2 on attempt 1, will retry in 60000ms
java.lang.RuntimeException: Error applying authorization policy on hive configuration: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
        at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
        at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
        at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
        at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
        at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
        at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:932)
        at org.apache.hadoop.hive.ql.session.SessionState.applyAuthorizationPolicy(SessionState.java:1893)
        at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:131)
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:115)
        ... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:964)
        at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:924)
        ... 15 more

My conf :




2 months ago link more_vert

Did you configure a remote metastore for Hive? From the logs, it seems you are using derby as metastore? I recommend using SQL Server, MySQL or PostgreSQL as metadata store.

format_quote

person Naseemuddin access_time 2 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

Hi Raymond, I was able to follow your instructions until the creation of the table. The insert step, however, does not seem to be working. It gets stuck with this output at the end

Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
2021-03-08 21:46:27,779 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
2021-03-08 21:46:27,780 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
2 months ago link more_vert

Also, I have been trying to use Spark on Hive tables. Here I get this output:

>>> spark.sql("show databases")
21/03/08 21:51:09 WARN NativeIO: NativeIO.getStat error (3): Das System kann den angegebenen Pfad nicht finden.
-- file path: tmp/hive
21/03/08 21:51:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/03/08 21:51:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/03/08 21:51:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/03/08 21:51:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.178.60
21/03/08 21:51:13 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
DataFrame[namespace: string]

Can you help me with that?

2 months ago link more_vert

Hi Raymond, I was able to follow your instructions until the creation of the table. The insert step, however, does not seem to be working. It gets stuck with this output at the end

Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
2021-03-08 21:46:27,779 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Starting Job = job_1615234468216_0003, Tracking URL = http://ADV075:8088/proxy/application_1615234468216_0003/
Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
2021-03-08 21:46:27,780 INFO [34a2b056-4193-4c1f-9363-5117d5aa0607 main] exec.Task (SessionState.java:printInfo(1227)) - Kill Command = C:\hadoop-3.3.0\bin\mapred job -kill job_1615234468216_0003
4 months ago link more_vert

Hi Renganathan,

Can you please try the following actions?

  1. Change Hadoop 3.3.0 environment variables to the Cygwin version as documented in this article? In the output you pasted earlier, it is still using Windows path:
    export HADOOP_HOME='/cygdrive/f/big-data/hadoop-3.3.0'
    export PATH=$PATH:$HADOOP_HOME/bin
    export HIVE_HOME='/cygdrive/f/big-data/apache-hive-3.1.2-bin'

    *Remember to change the path to your own ones

  2. Make sure you can run these commands successfully with expected output:
    ls /cygdrive/f/big-data/hadoop-3.3.0
    ls /cygdrive/f/big-data/apache-hive-3.1.2-bin

    *Remember to change the path to your own ones.

  3. Examine all your Hadoop paths and configurations that there is no space in any path include DFS path in Hadoop configurations.

If you follow exactly all my steps in Hadoop 3.3.0 and Hive 3.1.2 setup, there should be no issues - I've tested it. 

BTW, to answer one of your previous question, the script locates at: apache-hive-3.1.2-bin\bin\ext\schemaTool.sh.

-Raymond

format_quote

person Renganathan access_time 4 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

I have just tried with Hadoop 3.3.0 and I am getting the same error while I use the schematool. All the hadoop processes are running fine (they were running fine even in 2.9.1). Somewhere the HIVE setup is picking up my partial user name (the second word is picked; first & second words are separated by space) and class not found error is thrown.

4 months ago link more_vert

I have just tried with Hadoop 3.3.0 and I am getting the same error while I use the schematool. All the hadoop processes are running fine (they were running fine even in 2.9.1). Somewhere the HIVE setup is picking up my partial user name (the second word is picked; first & second words are separated by space) and class not found error is thrown.

4 months ago link more_vert
A
Administrator

Your Hadoop version is 2.9.1 while in the tutorial it is tested with Hadoop 3.3.0. Can you please use the same Hadoop version? For different versions, some libraries may conflict with each other and Hive 3.1.2 works with Hadoop 3.x.y but not Hadoop 2.x.

format_quote

person Renganathan access_time 4 months ago
Re: Apache Hive 3.1.2 Installation on Windows 10

Sure, thanks for your help Raymond!

Below is the output of the classpath you requested.

$ echo $HADOOP_CLASSPATH

E:\lion\Hadoop\hadoop-2.9.1\contrib\capacity-scheduler\*.jar;E:\Lion\Hadoop\hadoop-2.9.1\etc\hadoop;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\common\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\common\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\hdfs\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\yarn\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\mapreduce\lib\*;E:\Lion\Hadoop\hadoop-2.9.1\share\hadoop\mapreduce\*:/cygdrive/e/lion/Hadoop/apache-hive-3.1.2-bin/lib/*.jar

And you mentioned about the function schemaTool(). I am unable to find the function in my hive.sh script. Not sure where it is located.

The error I am getting after I submit the command is:

$HIVE_HOME/bin/schematool -dbType derby -initSchema

Error: Could not find or load main class Lion

And I am setting up all the environment variables in my crygwin before executing the schematool command.

Please let me know if you need any further details.

Thanks!

Forum discussions for column Hadoop.

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to contribute on Kontext to help others?

Learn more