Create Bucketed Sorted Table - Hive SQL

Raymond Raymond event 2020-08-25 visibility 1,476
more_vert

This page shows how to create bucketed sorted Hive tables via Hive SQL (HQL).

Create bucketed table

CLUSTERD BY is used to create bucketed table

Example:

CREATE TABLE IF NOT EXISTS hql.transactions_bucketed(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE)
COMMENT 'A table to store transactions'
PARTITIONED BY (txn_date DATE)
CLUSTERED BY(cust_id) INTO 100 BUCKETS
STORED AS PARQUET;

The above command creates a bucketed Hive table by cust_id column.  100 buckets will be created when reading from this table.

Create bucketed sorted table

SORTED BY is used to sort values.

Example:

CREATE TABLE IF NOT EXISTS hql.transactions_bucketed_sorted(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE)
COMMENT 'A table to store transactions'
PARTITIONED BY (txn_date DATE)
CLUSTERED BY(cust_id) SORTED BY(created_date DESC) INTO 100 BUCKETS
STORED AS PARQUET;

Install Hive database

Follow the article below to install Hive on Windows 10 via WSL if you don't have available available Hive database to practice Hive SQL:

Examples on this page are based on Hive 3.* syntax.

Run query

All these SQL statements can be run using beeline CLI:

$HIVE_HOME/bin/beeline --silent=true

The above command line connects to the default HiveServer2 service via beeline. Once beeline is loaded, type the following command to connect:

0: jdbc:hive2://localhost:10000> !connect jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hive
Enter password for jdbc:hive2://localhost:10000:
1: jdbc:hive2://localhost:10000>

The terminal looks like the following screenshot:

20200824111616-image.png

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts