Spark SQL - ROW_NUMBER Window Functions
About ROW_NUMBER function
ROW_NUMBER in Spark assigns a unique sequential number (starting from 1) to each record based on the ordering of rows in each window partition. It is commonly used to deduplicate data.
ROW_NUMBER without partition
The following sample SQL uses ROW_NUMBER function without PARTITION BY clause:
SELECT TXN.*, ROW_NUMBER() OVER (ORDER BY TXN_DT) AS ROWNUM FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROWNUM 101 10.01 2021-01-01 1 101 102.01 2021-01-01 2 102 93.00 2021-01-01 3 103 913.10 2021-01-02 4 101 900.56 2021-01-03 5
Each record has a unique number starting from 1.
ROW_NUMBER with partition
The following sample SQL returns a unique number for only records in each window (defined by PARTITION BY):
SELECT TXN.*, ROW_NUMBER() OVER (PARTITION BY ACCT ORDER BY TXN_DT) AS ROWNUM FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT ROWNUM 101 10.01 2021-01-01 1 101 102.01 2021-01-01 2 101 900.56 2021-01-03 3 103 913.10 2021-01-02 1 102 93.00 2021-01-01 1
Records are allocated to windows based on account number.
infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.
info Last modified by Raymond 5 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.