visibility 18 comment 0 access_time 2 months ago language English

codeSpark SQL - session_window Function

Spark SQL has built-in function session_window to create a window column based on a timestamp column and gap duration. The syntax of the function looks like the following:

session_window(timeColumn: ColumnOrName, gapDuration: [pyspark.sql.column.Column, str])

This function is available from Spark 3.2.0.

*These SQL statements can be directly used in PySpark DataFrame APIs too via spark.sql function.

This code snippet prints out the following output:

2022-08-01 12:01:00     {"start":2022-08-01 12:01:00,"end":2022-08-01 12:31:00}
2022-08-01 12:15:00     {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00}
2022-08-01 12:31:01     {"start":2022-08-01 12:31:01,"end":2022-08-01 13:01:01}

Code snippet

with t as (
select timestamp('2022-08-01 12:01:00') as ts 
UNION ALL select timestamp('2022-08-01 12:15:00') 
UNION ALL select timestamp('2022-08-01 12:31:01')
)
select t.ts, session_window(t.ts, '30 minutes') from t;
fork_right Fork
info Last modified by Kontext 2 months ago copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

comment Comments
No comments yet.