Spark SQL - NTILE Window Function

Raymond Tang Raymond Tang 0 3235 1.98 index 1/6/2021

About NTILE function

Spark NTILE function divides the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter).

Example

The following sample SQL uses NTILE function to divide records in each window to two buckets.

SELECT TXN.*, NTILE(2) OVER (PARTITION BY TXN_DT ORDER BY AMT) AS NTILE_BUCKET FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  NTILE_BUCKET
101     10.01   2021-01-01      1
102     93.00   2021-01-01      1
101     102.01  2021-01-01      2
101     900.56  2021-01-03      1
103     913.10  2021-01-02      1
102     913.10  2021-01-02      2

infoPARTITION BY is not mandatory; if it is not specified, all the records will be moved to one single partition which can cause performance issues. 

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

spark-sql spark-sql-function

Join the Discussion

View or add your thoughts below

Comments