Raymond Raymond

Spark SQL - ROW_NUMBER Window Functions

event 2020-12-31 visibility 24,246 comment 0 insights toc
more_vert
insights Stats

About ROW_NUMBER function

ROW_NUMBER in Spark assigns a unique sequential number (starting from 1) to each record based on the ordering of rows in each window partition. It is commonly used to deduplicate data.

ROW_NUMBER without partition

The following sample SQL uses ROW_NUMBER function without PARTITION BY clause:

SELECT TXN.*, ROW_NUMBER() OVER (ORDER BY TXN_DT) AS ROWNUM FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROWNUM
101     10.01   2021-01-01      1
101     102.01  2021-01-01      2
102     93.00   2021-01-01      3
103     913.10  2021-01-02      4
101     900.56  2021-01-03      5

Each record has a unique number starting from 1.

ROW_NUMBER with partition

The following sample SQL returns a unique number for only records in each window (defined by PARTITION BY):

SELECT TXN.*, 
ROW_NUMBER() OVER (PARTITION BY ACCT ORDER BY TXN_DT) AS ROWNUM 
FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  ROWNUM
101     10.01   2021-01-01      1
101     102.01  2021-01-01      2
101     900.56  2021-01-03      3
103     913.10  2021-01-02      1
102     93.00   2021-01-01      1

Records are allocated to windows based on account number.  

infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order. 

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts