Spark SQL - LEAD Window Function

Raymond Raymond event 2021-01-06 visibility 7,874
more_vert

About LEAD function

Spark LEAD function provides access to a row at a given offset that follows the current row in a window. This analytic function can be used in a SELECT statement to compare values in the current row with values in a following row. This function is like Spark SQL - LAG Window Function.

Function signature

lead(input[, offset[, default]])	OVER ([PARYITION BY ..] ORDER BY ...)
  • input: column name to get values from.
  • offset: the default value of parameter `offset` is 1.
  • default: the default value of `default` is null.

If the value of `input` at the `offset`th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), `default` is returned.

Example

The following sample SQL uses LEAD function to find the subsequent transaction record's amount based on DATE for each account. No default value is specified. 

SELECT TXN.*, LEAD(AMT,1) OVER (PARTITION BY ACCT ORDER BY TXN_DT) AS AMT_SUB FROM VALUES 
(101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  AMT_SUB
101     10.01   2021-01-01      102.01
101     102.01  2021-01-01      900.56
101     900.56  2021-01-03      NULL
103     913.10  2021-01-02      NULL
102     93.00   2021-01-01      913.10
102     913.10  2021-01-02      NULL
infoPARTITION BY is not mandatory; if it is not specified, all the records will be moved to one single partition which can cause performance issues. 

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts