Configure HiveServer2 to Enable Transactions (ACID Support)
Hive by default doesn't enable transactions/ACID support. Thus to enable it, we need to add configurations into hive-site.xml
. This article provides details to enable these configurations. Without this, the Hive compactor or other features may not work properly.
Minimum configurations
The following are the minimum configuration items required:
Client side
- hive.support.concurrency – true
- hive.enforce.bucketing – true (Not required as of Hive 2.0 or above)
- hive.exec.dynamic.partition.mode – nonstrict
- hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Server side (Metastore)
- hive.compactor.initiator.on – true (See table below for more details)
- hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service.
Configure in hive-site.xml
Now let's start to configure these attributes.
Open hive-site.xml
file in folder config under Hive installation home folder. Typically the path is $HIVE_HOME/config/hive-site.xml
.
Currently, my configuration file has the following attributes:
cat $HIVE_HOME/conf/hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> <description> Should metastore do authorization against database notification related APIs such as get_next_notification. If set to true, then only the superusers in proxy settings have the permission </description> </property> <property> <name>hive.metastore.db.type</name> <value>mysql</value> <description> Expects one of [derby, oracle, mysql, mssql, postgres]. Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it. </description> </property> <property> <name>hive.metastore.uris</name> <value>thrift://127.0.0.1:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>***</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.ds.connection.url.hook</name> <value/> <description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hive_metastore</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> </configuration>
Add the following configuration properties into the file:
<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property> <property> <name>name>hive.enforce.bucketing</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nostrict</value> </property> <property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.compactor.worker.threads</name> <value>1</value> </property>
I am adding just one thread for compactor worker threads since it is a local Hive instance configured in my Windows Subsystem for Linux (Apache Hive 3.1.1 Installation on Windows 10 using Windows Subsystem for Linux).
Restart the services
Once added, you can restart your Hive metastore and hiveserver2 services to make them effective using the following commands:
echo 'Starting hive metastore service...' $HIVE_HOME/bin/hive --service metastore & echo 'Starting hive hiveserver2 service...' $HIVE_HOME/bin/hive --service hiveserver2 &
References
Hive Transactions - Apache Hive - Apache Software Foundation