Spark SQL - RANK Window Function
About RANK function
RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential.
RANK without partition
The following sample SQL uses RANK function without PARTITION BY clause:
SELECT TXN.*, RANK() OVER (ORDER BY TXN_DT) AS ROW_RANK FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROW_RANK 101 10.01 2021-01-01 1 101 102.01 2021-01-01 1 102 93.00 2021-01-01 1 103 913.10 2021-01-02 4 101 900.56 2021-01-03 5
RANK with partition
The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):
SELECT TXN.*, RANK() OVER (PARTITION BY TXN_DT ORDER BY AMT DESC) AS ROWNUM FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROWNUM 101 102.01 2021-01-01 1 102 93.00 2021-01-01 2 101 10.01 2021-01-01 3 101 900.56 2021-01-03 1 103 913.10 2021-01-02 1
Records are allocated to windows based on TXN_DT column and the rank is computed based on column AMT.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.