Spark SQL - RANK Window Function

access_time 9 days ago visibility10 comment 0

About RANK function

RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential.  

RANK without partition

The following sample SQL uses RANK function without PARTITION BY clause:

SELECT TXN.*, RANK() OVER (ORDER BY TXN_DT) AS ROW_RANK FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROW_RANK
101     10.01   2021-01-01      1
101     102.01  2021-01-01      1
102     93.00   2021-01-01      1
103     913.10  2021-01-02      4
101     900.56  2021-01-03      5
warning The following warning message will show: WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. 

RANK with partition

The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):

SELECT TXN.*, 
RANK() OVER (PARTITION BY TXN_DT ORDER BY AMT DESC) AS ROWNUM 
FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROWNUM
101     102.01  2021-01-01      1
102     93.00   2021-01-01      2
101     10.01   2021-01-01      3
101     900.56  2021-01-03      1
103     913.10  2021-01-02      1

Records are allocated to windows based on TXN_DT column and the rank is computed based on column AMT.  

infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

info Last modified by Raymond 9 days ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 7
thumb_up 0
access_time 9 days ago

Spark LEAD function provides access to a row at a given offset that follows the current row in a window. This analytic function can be used in a SELECT statement to compare values in the current row with values in a following row. This function is like  Spark SQL - LAG Window Function .

visibility 8
thumb_up 0
access_time 9 days ago

Spark NTILE function   divides  the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter).  The following sample SQL uses NTILE function to divide records in each window to two buckets.  SELECT TXN.*, NTILE(2) OVER (PARTITION BY ...

visibility 10
thumb_up 0
access_time 5 days ago

In Spark SQL, function std or   stddev or    stddev_sample  can be used to calculate sample standard deviation from values of a group.  std(expr) stddev(expr) stddev_samp(expr) The first two functions are the alias of stddev_sample function. SELECT ACCT ...