Code sql

Spark SQL - window Function

Kontext Kontext visibility 203 comment 0 access_time 2 years ago language English

descriptionCode description

Spark SQL has built-in function window to bucketize rows into one or more time windows given a timestamp specifying column. The syntax of the function looks like the following:

window(timeColumn: ColumnOrName, windowDuration: str, slideDuration: Optional[str] = None, startTime: Optional[str] = None) 

This function is available from Spark 2.0.0. slideDuration must be less than or equal to windowDuration.

*These SQL statements can be directly used in PySpark DataFrame APIs too via spark.sql function.

This code snippet prints out the following outputs:

Query 1:

2022-08-01 12:01:00     {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:15:00     {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:31:01     {"start":2022-08-01 12:30:00,"end":2022-08-01 13:00:00}

The first two rows are in the same window [00:00, 00:30).

Query 2:

2022-08-01 12:01:00     {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:01:00     {"start":2022-08-01 11:45:00,"end":2022-08-01 12:15:00}
2022-08-01 12:15:00     {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00}
2022-08-01 12:15:00     {"start":2022-08-01 12:00:00,"end":2022-08-01 12:30:00}
2022-08-01 12:31:01     {"start":2022-08-01 12:30:00,"end":2022-08-01 13:00:00}
2022-08-01 12:31:01     {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00}
info Last modified by Kontext 2 years ago copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts