Spark SQL - LEAD Window Function
insights Stats
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
About LEAD function
Spark LEAD function provides access to a row at a given offset that follows the current row in a window. This analytic function can be used in a SELECT statement to compare values in the current row with values in a following row. This function is like Spark SQL - LAG Window Function.
Function signature
lead(input[, offset[, default]]) OVER ([PARYITION BY ..] ORDER BY ...)
- input: column name to get values from.
- offset: the default value of parameter `offset` is 1.
- default: the default value of `default` is null.
If the value of `input` at the `offset`th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), `default` is returned.
Example
The following sample SQL uses LEAD function to find the subsequent transaction record's amount based on DATE for each account. No default value is specified.
SELECT TXN.*, LEAD(AMT,1) OVER (PARTITION BY ACCT ORDER BY TXN_DT) AS AMT_SUB FROM VALUES
(101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT);
Result:
ACCT AMT TXN_DT AMT_SUB 101 10.01 2021-01-01 102.01 101 102.01 2021-01-01 900.56 101 900.56 2021-01-03 NULL 103 913.10 2021-01-02 NULL 102 93.00 2021-01-01 913.10 102 913.10 2021-01-02 NULL
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.