Spark SQL - RANK Window Function
About RANK function
RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential.
RANK without partition
The following sample SQL uses RANK function without PARTITION BY clause:
SELECT TXN.*, RANK() OVER (ORDER BY TXN_DT) AS ROW_RANK FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROW_RANK 101 10.01 2021-01-01 1 101 102.01 2021-01-01 1 102 93.00 2021-01-01 1 103 913.10 2021-01-02 4 101 900.56 2021-01-03 5
warning The following warning message will show: WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
RANK with partition
The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):
SELECT TXN.*, RANK() OVER (PARTITION BY TXN_DT ORDER BY AMT DESC) AS ROWNUM FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROWNUM 101 102.01 2021-01-01 1 102 93.00 2021-01-01 2 101 10.01 2021-01-01 3 101 900.56 2021-01-03 1 103 913.10 2021-01-02 1
Records are allocated to windows based on TXN_DT column and the rank is computed based on column AMT.
infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.
info Last modified by Raymond 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
warning Please login first to view stats information.
article
Spark SQL - Return JSON Array Length (json_array_length)
article
Spark SQL - Map Functions
code
Replace Values via regexp_replace Function in PySpark DataFrame
article
Spark SQL - Standard Deviation Calculation
article
Spark SQL - Convert Delimited String to Map using str_to_map Function
Read more (46)