Spark SQL - FIRST_VALUE or LAST_VALUE
In Spark SQL, function FIRST_VALUE (FIRST) and LAST_VALUE (LAST) can be used to to find the first or the last value of given column or expression for a group of rows. If parameter `isIgnoreNull` is specified as true, they return only non-null values (unless all values are null).
Function signature
first(expr[, isIgnoreNull]) first_value(expr[, isIgnoreNull]) last(expr[, isIgnoreNull]) last_value(expr[, isIgnoreNull])
Code snippets
first_value (first)
SELECT ACCT, first_value(AMT,true) FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT) group by ACCT;
Output:
ACCT first_value(AMT) 101 10.01 103 913.10 102 93.00
last_value (last)
SELECT ACCT, last_value(AMT,true), last(AMT,true) FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT, TXN_DT) group by ACCT;
Ouput:
ACCT last_value(AMT) last(AMT) 101 900.56 900.56 103 913.10 913.10 102 913.10 913.10
copyright
This page is subject to Site terms.
comment Comments
N
Nikesh V
Nikesh
access_time
3 years ago
link
more_vert
Raymond
Raymond
access_time
3 years ago
link
more_vert
Hi Nikesh,
To make it simple, you can register your DataFrame as a temporary view and then you can use Spark SQL functions to aggregate and using these functions like first_value or last_value.
Thanks for the article!
Do you know the Scala equivalent of this, I used the first, but it its not picking up the first not null value, its just picking the first value.