Spark SQL - NTILE Window Function

access_time 10 days ago visibility9 comment 0

About NTILE function

Spark NTILE function divides the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter). 

Example

The following sample SQL uses NTILE function to divide records in each window to two buckets. 

SELECT TXN.*, NTILE(2) OVER (PARTITION BY TXN_DT ORDER BY AMT) AS NTILE_BUCKET FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  NTILE_BUCKET
101     10.01   2021-01-01      1
102     93.00   2021-01-01      1
101     102.01  2021-01-01      2
101     900.56  2021-01-03      1
103     913.10  2021-01-02      1
102     913.10  2021-01-02      2
infoPARTITION BY is not mandatory; if it is not specified, all the records will be moved to one single partition which can cause performance issues. 

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 9
thumb_up 0
access_time 10 days ago

Spark NTILE function   divides  the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter).  The following sample SQL uses NTILE function to divide records in each window to two buckets.  SELECT TXN.*, NTILE(2) OVER (PARTITION BY ...

visibility 7
thumb_up 0
access_time 6 days ago

In Spark SQL, function FIRST_VALUE (FIRST) and LAST_VALUE (LAST) can be used to to find the first or the last value of given column or expression for a group of rows. If parameter `isIgnoreNull` is specified as true, they return only non-null values (unless all values are null). first(expr[ ...

visibility 18
thumb_up 0
access_time 14 days ago

ROW_NUMBER in Spark assigns a unique sequential number (starting from 1) to each record based on the ordering of rows in each window partition. It is commonly used to deduplicate data. The following sample SQL uses ROW_NUMBER function without PARTITION BY clause: SELECT TXN.*, ROW_NUMBER() OVER ...