Spark SQL - DENSE_RANK Window Function
About DENSE_RANK function
DENSE_RANK is similar as Spark SQL - RANK Window Function. It calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are sequential in each window thus no gaps will be generated.
DENSE_RANK without partition
The following sample SQL uses DENSE_RANK function without PARTITION BY clause:
SELECT TXN.*, DENSE_RANK() OVER (ORDER BY TXN_DT) AS ROW_RANK FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROW_RANK 101 10.01 2021-01-01 1 101 102.01 2021-01-01 1 102 93.00 2021-01-01 1 103 913.10 2021-01-02 2 102 913.10 2021-01-02 2 101 900.56 2021-01-03 3
DENSE_RANK with partition
The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):
SELECT TXN.*, DENSE_RANK() OVER (PARTITION BY TXN_DT ORDER BY AMT DESC) AS ROW_RANK FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROW_RANK 101 102.01 2021-01-01 1 102 93.00 2021-01-01 2 101 10.01 2021-01-01 3 101 900.56 2021-01-03 1 103 913.10 2021-01-02 1 102 913.10 2021-01-02 1
Records are allocated to windows based on TXN_DT column and the rank is computed based on column AMT in each window.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.