About NTILE function
Spark NTILE function divides the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter).
Example
The following sample SQL uses NTILE function to divide records in each window to two buckets.
SELECT TXN.*, NTILE(2) OVER (PARTITION BY TXN_DT ORDER BY AMT) AS NTILE_BUCKET FROM VALUES
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT NTILE_BUCKET
101 10.01 2021-01-01 1
102 93.00 2021-01-01 1
101 102.01 2021-01-01 2
101 900.56 2021-01-03 1
103 913.10 2021-01-02 1
102 913.10 2021-01-02 2
infoPARTITION BY is not mandatory; if it is not specified, all the records will be moved to one single partition which can cause performance issues.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.