Code description
Spark SQL has built-in function session_window
to create a window column based on a timestamp column and gap duration. The syntax of the function looks like the following:
session_window(timeColumn: ColumnOrName, gapDuration: [pyspark.sql.column.Column, str])
This function is available from Spark 3.2.0.
*These SQL statements can be directly used in PySpark DataFrame APIs too via spark.sql
function.
This code snippet prints out the following output:
2022-08-01 12:01:00 {"start":2022-08-01 12:01:00,"end":2022-08-01 12:31:00} 2022-08-01 12:15:00 {"start":2022-08-01 12:15:00,"end":2022-08-01 12:45:00} 2022-08-01 12:31:01 {"start":2022-08-01 12:31:01,"end":2022-08-01 13:01:01}