Spark SQL - LAG Window Function

Raymond Tang Raymond Tang 0 16866 10.31 index 1/6/2021

About LAG function

Spark LAG function provides access to a row at a given offset that comes before the current row in the windows. This function can be used in a SELECT statement to compare values in the current row with values in a previous row.

Function signature

lag(input[, offset[, default]])	OVER ([PARYITION BY ..] ORDER BY ...)
  • offset: the default value of parameter offset is 1.
  • default: the default value of default is null.

If the value of input at the offsetth row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default is returned.

Example

The following sample SQL uses LAG function to find the previous transaction record's amount based on DATE for each account. No default value is specified.

SELECT TXN.*, LAG(AMT,1) OVER (PARTITION BY ACCT ORDER BY TXN_DT) AS AMT_PREV FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Result:

ACCT    AMT     TXN_DT  AMT_PREV
101     10.01   2021-01-01      NULL
101     102.01  2021-01-01      10.01
101     900.56  2021-01-03      102.01
103     913.10  2021-01-02      NULL
102     93.00   2021-01-01      NULL
102     913.10  2021-01-02      93.00

infoPARTITION BY is not mandatory; if it is not specified, all the records will be moved to one single partition which can cause performance issues. 

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

spark-sql spark-sql-function

Join the Discussion

View or add your thoughts below

Comments