Spark SQL - window Function
Code description
Spark SQL has built-in function window
to bucketize rows into one or more time windows given a timestamp specifying column. The syntax of the function looks like the following:
window(timeColumn: ColumnOrName, windowDuration: str, slideDuration: Optional[str] = None, startTime: Optional[str] = None)
This function is available from Spark 2.0.0. slideDuration
must be less than or equal to windowDuration
.
*These SQL statements can be directly used in PySpark DataFrame APIs too via spark.sql
function.
This code snippet prints out the following outputs:
Query 1:
2022-08-01 12:01:00 {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00} 2022-08-01 12:15:00 {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00} 2022-08-01 12:31:01 {"start":2022-08-01 12:30:00,"end":2022-08-01 13:00:00}
The first two rows are in the same window [00:00, 00:30).
Query 2:
2022-08-01 12:01:00 {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:01:00 {"start":2022-08-01 11:45:00,"end":2022-08-01 12:15:00}
2022-08-01 12:15:00 {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00}
2022-08-01 12:15:00 {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:31:01 {"start":2022-08-01 12:30:00,"end":2022-08-01 13:00:00}
2022-08-01 12:31:01 {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00}
Code snippet
with t as ( select timestamp('2022-08-01 12:01:00') as ts UNION ALL select timestamp('2022-08-01 12:15:00') UNION ALL select timestamp('2022-08-01 12:31:01') ) select t.ts, window(t.ts, '30 minutes') from t; with t as ( select timestamp('2022-08-01 12:01:00') as ts UNION ALL select timestamp('2022-08-01 12:15:00') UNION ALL select timestamp('2022-08-01 12:31:01') ) select t.ts, window(t.ts, '30 minutes', '15 minutes') from t;
info Last modified by Kontext 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.