Spark SQL - DENSE_RANK Window Function

access_time 10 days ago visibility9 comment 0

About DENSE_RANK function

DENSE_RANK is similar as Spark SQL - RANK Window Function. It calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are sequential in each window thus no gaps will be generated. 

DENSE_RANK without partition

The following sample SQL uses DENSE_RANK function without PARTITION BY clause:

SELECT TXN.*, DENSE_RANK() OVER (ORDER BY TXN_DT) AS ROW_RANK FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROW_RANK
101     10.01   2021-01-01      1
101     102.01  2021-01-01      1
102     93.00   2021-01-01      1
103     913.10  2021-01-02      2
102     913.10  2021-01-02      2
101     900.56  2021-01-03      3
warning The following warning message will show: WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. 

DENSE_RANK with partition

The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):

SELECT TXN.*, 
DENSE_RANK() OVER (PARTITION BY TXN_DT ORDER BY AMT DESC) AS ROW_RANK 
FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROW_RANK
101     102.01  2021-01-01      1
102     93.00   2021-01-01      2
101     10.01   2021-01-01      3
101     900.56  2021-01-03      1
103     913.10  2021-01-02      1
102     913.10  2021-01-02      1

Records are allocated to windows based on TXN_DT column and the rank is computed based on column AMT in each window.  

infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 11
thumb_up 0
access_time 6 days ago

Like other SQL engines, Spark also supports PIVOT clause. PIVOT is usually used to calculated aggregated values for each value in a column and the calculated values will be included as columns in the result set. PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ] FOR ...

visibility 9
thumb_up 0
access_time 7 days ago

JSON string values can be extracted using built-in Spark functions like get_json_object or json_tuple.  Values can be extracted using get_json_object function. The function has two parameters: json_txt and path. The first is the JSON text itself, for example a string column in your Spark ...

visibility 7
thumb_up 0
access_time 8 days ago

Function current_date() or current_date can be used to return the current date at the start of query evaluation.  Example: spark-sql> select current_date(); current_date() 2021-01-09 spark-sql> select current_date; current_date() 2021-01-09 *Brackets are optional for this ...