Configure HiveServer2 to Enable Transactions (ACID Support)

event 2022-08-20 visibility 619 comment 0 insights
more_vert
insights Stats
Raymond Raymond Hadoop, Hive & HBase

Articles about Apache Hadoop, Hive and HBase installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop.

Hive by default doesn't enable transactions/ACID support. Thus to enable it, we need to add configurations into hive-site.xml. This article provides details to enable these configurations. Without this, the Hive compactor or other features may not work properly.

Minimum configurations

The following are the minimum configuration items required:

Client side

  • hive.support.concurrency – true
  • hive.enforce.bucketing – true (Not required as of Hive 2.0 or above)
  • hive.exec.dynamic.partition.mode – nonstrict
  • hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Server side (Metastore)

  • hive.compactor.initiator.on – true (See table below for more details)
  • hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service.

Configure in hive-site.xml

Now let's start to configure these attributes.

Open hive-site.xml file in folder config under Hive installation home folder. Typically the path is $HIVE_HOME/config/hive-site.xml

Currently, my configuration file has the following attributes:

cat $HIVE_HOME/conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
    <description>
      Should metastore do authorization against database notification related APIs such as get_next_notification.
      If set to true, then only the superusers in proxy settings have the permission
    </description>
  </property>
  <property>
    <name>hive.metastore.db.type</name>
    <value>mysql</value>
    <description>
      Expects one of [derby, oracle, mysql, mssql, postgres].
      Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
    </description>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://127.0.0.1:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>***</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.ds.connection.url.hook</name>
    <value/>
    <description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost/hive_metastore</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
</configuration>

Add the following configuration properties into the file:

<property>
     <name>hive.support.concurrency</name>
     <value>true</value>
</property>

<property>
     <name>hive.txn.manager</name>
     <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>

<property>
     <name>name>hive.enforce.bucketing</name>
     <value>true</value>
</property>

<property>
     <name>hive.exec.dynamic.partition.mode</name>
     <value>nostrict</value>
</property>

<property>
     <name>hive.compactor.initiator.on</name>
     <value>true</value>
</property>

<property>
     <name>hive.compactor.worker.threads</name>
     <value>1</value>
</property>

I am adding just one thread for compactor worker threads since it is a local Hive instance configured in my Windows Subsystem for Linux (Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux).

Restart the services

Once added, you can restart your Hive metastore and hiveserver2 services to make them effective using the following commands:

echo 'Starting hive metastore service...'
$HIVE_HOME/bin/hive --service metastore &
echo 'Starting hive hiveserver2 service...'
$HIVE_HOME/bin/hive --service hiveserver2 &

References

Hive Transactions - Apache Hive - Apache Software Foundation

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts