Create Bucketed Sorted Table - Hive SQL

access_time 1 month ago visibility8 comment 0

This page shows how to create bucketed sorted Hive tables via Hive SQL (HQL).

Create bucketed table

CLUSTERD BY is used to create bucketed table

Example:

CREATE TABLE IF NOT EXISTS hql.transactions_bucketed(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE)
COMMENT 'A table to store transactions'
PARTITIONED BY (txn_date DATE)
CLUSTERED BY(cust_id) INTO 100 BUCKETS
STORED AS PARQUET;

The above command creates a bucketed Hive table by cust_id column.  100 buckets will be created when reading from this table.

Create bucketed sorted table

SORTED BY is used to sort values.

Example:

CREATE TABLE IF NOT EXISTS hql.transactions_bucketed_sorted(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE)
COMMENT 'A table to store transactions'
PARTITIONED BY (txn_date DATE)
CLUSTERED BY(cust_id) SORTED BY(created_date DESC) INTO 100 BUCKETS
STORED AS PARQUET;

Install Hive database

Follow the article below to install Hive on Windows 10 via WSL if you don't have available available Hive database to practice Hive SQL:

Examples on this page are based on Hive 3.* syntax.

Run query

All these SQL statements can be run using beeline CLI:

$HIVE_HOME/bin/beeline --silent=true

The above command line connects to the default HiveServer2 service via beeline. Once beeline is loaded, type the following command to connect:

0: jdbc:hive2://localhost:10000> !connect jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hive
Enter password for jdbc:hive2://localhost:10000:
1: jdbc:hive2://localhost:10000>

The terminal looks like the following screenshot:


info Last modified by Administrator at 1 month ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

Apache Hive 3.1.2 Installation on Windows 10

local_offer hive local_offer hadoop local_offer windows10 local_offer big-data-on-windows-10

visibility 365
thumb_up 0
access_time 2 months ago

Hive 3.1.2 was released on 26th Aug 2019. It is still the latest 3.x release and works with Hadoop 3.x.y releases. In this article, I’m going to provide step by step instructions about installing Hive 3.1.2 on Windows 10. * Logos are registered trademarks of Apache Hive and Microsoft Windows.

local_offer python local_offer spark local_offer pyspark local_offer hive local_offer spark-database-connect

visibility 22190
thumb_up 4
access_time 2 years ago

From Spark 2.0, you can easily read data from Hive data warehouse and also write/append new data to Hive tables. This page shows how to operate with Hive in Spark including: Create DataFrame from existing Hive table Save DataFrame to a new Hive table Append data to the existing Hive table via ...

local_offer hive local_offer SQL local_offer hive-sql-ddl

visibility 13
thumb_up 0
access_time 1 month ago

This page shows how to create partitioned Hive tables via Hive SQL (HQL). Example: CREATE TABLE IF NOT EXISTS hql.transactions(txn_id BIGINT, cust_id INT, amount DECIMAL(20,2),txn_type STRING, created_date DATE) COMMENT 'A table to store transactions' PARTITIONED BY (txn_date DATE) STORED AS ...

About column

Articles about Apache Hadoop installation, performance tuning and general tutorials.

*The yellow elephant logo is a registered trademark of Apache Hadoop.

rss_feed Subscribe RSS