Teradata: FIRST_VALUE and LAST_VALUE Funtions
insights Stats
Like other databases, FIRST_VALUE and LAST_VALUE are two of commonly used windowing functions in Teradata. In one of my old posts about SQL Server 2012 (Chinese language), I showed how to use FIRST_VALUE and LAST_VALUE. For this article, I will provide examples of using them in Teradata but more importantly I want to highlight one part about ordered window.
Sample table
This article utilizes a table (TestDb.test_table) created in one of my previous articles: Extract JSON Data via SQL Functions in Teradata - Kontext. The content of the table looks like the following screenshot:
Use FIRST_VALUE function
The following statement creates windows via column category without ORDER BY clause:
SEL r.*, FIRST_VALUE(amount) OVER(PARTITION BY category) AS FV_amount FROM TestDb.test_table r;
The output:
id category amount FV_amount 1 1 A 10 10 2 2 A 11 10 3 3 A 12 10 4 4 B 100 100 5 6 B 102 100 6 5 B 101 100 7 4 C 1,000 1,000 8 6 C 1,002 1,000 9 5 C 1,001 1,000
Use LAST_VALUE function
Similarly, we can use LAST_VALUE to retrieve the last value of amount column in each window (by column category).
SEL r.*, LAST_VALUE(amount) OVER(PARTITION BY category) AS LV_amount FROM TestDb.test_table r;
The results:
id category amount LV_amount 1 1 A 10 12 2 2 A 11 12 3 3 A 12 12 4 4 B 100 101 5 6 B 102 101 6 5 B 101 101 7 4 C 1,000 1,001 8 6 C 1,002 1,001 9 5 C 1,001 1,001
If we add ORDER BY clause to the statement, the results will be different:
SEL r.*, LAST_VALUE(amount) OVER(PARTITION BY category ORDER BY id) AS LV_amount FROM TestDb.test_table r;
Results:
id category amount LV_amount 1 1 A 10 10 2 2 A 11 11 3 3 A 12 12 4 4 B 100 100 5 5 B 101 101 6 6 B 102 102 7 4 C 1,000 1,000 8 5 C 1,001 1,001 9 6 C 1,002 1,002
Column LV_amount just returns different values for each row in the same window. This is the special part I want to highlight: If ORDER BY is not specified, the entire partition is used for a window frame; if it is specified, the default rows bounding are applied: UNBOUNDED PRECEDING AND CURRENT ROW. With this rule, it is obviously that the above statement will return different values.
With ORDER BY and window frame bound
We can use window frame bound clause to specify the range for each calculation.
SEL r.*, LAST_VALUE(amount) OVER(PARTITION BY category ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LV_amount FROM TestDb.test_table r;
Now the results look like the following:
id category amount LV_amount 1 1 A 10 12 2 2 A 11 12 3 3 A 12 12 4 4 B 100 102 5 5 B 101 102 6 6 B 102 102 7 4 C 1,000 1,002 8 5 C 1,001 1,002 9 6 C 1,002 1,002
Retrieve previous and next row values
With ROWS BETWEEN clause, we can easily retrieve previous and next row values.
SEL r.*, FIRST_VALUE(amount IGNORE NULLS) OVER(PARTITION BY category ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS PREV_amount, LAST_VALUE(amount) OVER(PARTITION BY category ORDER BY id ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS NEXT_amount FROM TestDb.test_table r;
The results:
id category amount PREV_amount NEXT_amount 1 1 A 10 ? 11 2 2 A 11 10 12 3 3 A 12 11 ? 4 4 B 100 ? 101 5 5 B 101 100 102 6 6 B 102 101 ? 7 4 C 1,000 ? 1,001 8 5 C 1,001 1,000 1,002 9 6 C 1,002 1,001 ?
References
Spark SQL - FIRST_VALUE or LAST_VALUE - Kontext
Select First Row in Each GROUP BY Group in Teradata - Kontext