Spark SQL - FIRST_VALUE or LAST_VALUE

Raymond Raymond event 2021-01-10 visibility 10,214 comment 2
more_vert

In Spark SQL, function FIRST_VALUE (FIRST) and LAST_VALUE (LAST) can be used to to find the first or the last value of given column or expression for a group of rows. If parameter `isIgnoreNull` is specified as true, they return only non-null values (unless all values are null).

Function signature

first(expr[, isIgnoreNull])	

first_value(expr[, isIgnoreNull])

last(expr[, isIgnoreNull])

last_value(expr[, isIgnoreNull])

Code snippets

first_value (first)

SELECT ACCT, first_value(AMT,true) FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT)
group by ACCT;

Output:

ACCT    first_value(AMT)
101     10.01
103     913.10
102     93.00

last_value (last)

SELECT ACCT, last_value(AMT,true), last(AMT,true) FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT)
group by ACCT;

Ouput:

ACCT    last_value(AMT) last(AMT)
101     900.56  900.56
103     913.10  913.10
102     913.10  913.10
More from Kontext
comment Comments
N Nikesh V

Nikesh access_time 3 years ago link more_vert

Thanks for the article!

Do you know the Scala equivalent of this, I used the first, but it its not picking up the first not null value, its just picking the first value.

20221023111909-image.png

Raymond Raymond

Raymond access_time 3 years ago link more_vert

Hi Nikesh,

To make it simple, you can register your DataFrame as a temporary view and then you can use Spark SQL functions to aggregate and using these functions like first_value or last_value. 

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts