Create Bucketed Sorted Table - Hive SQL
insights Stats
Articles about Apache Hadoop, Hive and HBase installation, performance tuning and general tutorials.
*The yellow elephant logo is a registered trademark of Apache Hadoop.
This page shows how to create bucketed sorted Hive tables via Hive SQL (HQL).
Create bucketed table
CLUSTERD BY is used to create bucketed table
Example:
CREATE TABLE IF NOT EXISTS hql.transactions_bucketed(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE) COMMENT 'A table to store transactions' PARTITIONED BY (txn_date DATE) CLUSTERED BY(cust_id) INTO 100 BUCKETS STORED AS PARQUET;
The above command creates a bucketed Hive table by cust_id column. 100 buckets will be created when reading from this table.
Create bucketed sorted table
SORTED BY is used to sort values.
Example:
CREATE TABLE IF NOT EXISTS hql.transactions_bucketed_sorted(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE) COMMENT 'A table to store transactions' PARTITIONED BY (txn_date DATE) CLUSTERED BY(cust_id) SORTED BY(created_date DESC) INTO 100 BUCKETS STORED AS PARQUET;
Install Hive database
Follow the article below to install Hive on Windows 10 via WSL if you don't have available available Hive database to practice Hive SQL:
Examples on this page are based on Hive 3.* syntax.
Run query
All these SQL statements can be run using beeline CLI:
$HIVE_HOME/bin/beeline --silent=true
The above command line connects to the default HiveServer2 service via beeline. Once beeline is loaded, type the following command to connect:
0: jdbc:hive2://localhost:10000> !connect jdbc:hive2://localhost:10000 Enter username for jdbc:hive2://localhost:10000: hive Enter password for jdbc:hive2://localhost:10000: 1: jdbc:hive2://localhost:10000>
The terminal looks like the following screenshot: