Hive SQL - Differences between Order By and Sort By

Kontext Kontext event 2022-07-10 visibility 1,011
more_vert
Hive SQL - Differences between Order By and Sort By

Hive provides two clause ORDER BY and SORT BY to sort results a query. However they are slightly different from each other.

Syntax of ORDER BY and SORT BY

For ORDER BY, the syntax is defined as below:

colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy 

For SORT BY, the syntax is very similar:

colOrder: ( ASC | DESC )
sortBy: SORT BY colName colOrder? (',' colName colOrder?)*
query: SELECT expression (',' expression)* FROM src sortBy

From syntax perspective, SORT BY doesn't support NULL ORDER sub clause.  As other database engines, NULL cannot be compared with other non-null values, hence HQL also provides the NULL order clause (NULLS FIRST or NULLS LAST, available from Hive 2.1.0) to decide the sequence of NULL values. 

Differences

Hive sorts the rows using columns specified in SORT BY before sending the data to reducer. Thus SORT BY only guarantees the order per reducer while ORDER BY will decide the total order of records in the final output.

Examples

The following are some examples of using these two clauses. The table used was created in Hive SQL - Aggregate Functions Overview with Examples.

Use ORDER BY

select * from hivesql.transactions ORDER BY acct ASC, txn_date;

Results:

+--------------------+----------------------+------------------------+
| transactions.acct  | transactions.amount  | transactions.txn_date  |
+--------------------+----------------------+------------------------+
| 101                | 102.01               | 2021-01-01             |
| 101                | 10.01                | 2021-01-01             |
| 101                | 900.56               | 2021-01-03             |
| 102                | 93.00                | 2021-01-01             |
| 102                | 913.10               | 2021-01-02             |
| 103                | 913.10               | 2021-01-02             |
+--------------------+----------------------+------------------------+

Use SORT BY

select * from hivesql.transactions SORT BY acct ASC, txn_date;

Results:

+--------------------+----------------------+------------------------+
| transactions.acct  | transactions.amount  | transactions.txn_date  |
+--------------------+----------------------+------------------------+
| 101                | 102.01               | 2021-01-01             |
| 101                | 10.01                | 2021-01-01             |
| 101                | 900.56               | 2021-01-03             |
| 102                | 93.00                | 2021-01-01             |
| 102                | 913.10               | 2021-01-02             |
| 103                | 913.10               | 2021-01-02             |
+--------------------+----------------------+------------------------+

The above result is exactly the same as ORDER BY as there is only on reducer.

Let's increase reducer to 2 and then run the script again:

set mapreduce.job.reduces=2;
select * from hivesql.transactions SORT BY acct ASC, txn_date;

Results:

+--------------------+----------------------+------------------------+
| transactions.acct  | transactions.amount  | transactions.txn_date  |
+--------------------+----------------------+------------------------+
| 101                | 10.01                | 2021-01-01             |
| 102                | 93.00                | 2021-01-01             |
| 102                | 913.10               | 2021-01-02             |
| 103                | 913.10               | 2021-01-02             |
| 101                | 102.01               | 2021-01-01             |
| 101                | 900.56               | 2021-01-03             |
+--------------------+----------------------+------------------------+

The results are now in different sequence as SORT BY will only guarantee orders of records in each reducer. However, the result will stay as the same if using ORDER BY.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts