PySpark DataFrame - Add or Subtract Milliseconds from Timestamp Column

Kontext Kontext event 2022-09-01 visibility 3,118
more_vert

Code description

This code snippets shows you how to add or subtract milliseconds (or microseconds) and seconds from a timestamp column in Spark DataFrame.

It first creates a DataFrame in memory and then add and subtract milliseconds/seconds from the timestamp column ts using Spark SQL internals. 

Output:

+---+--------------------------+--------------------------+--------------------------+--------------------------+
|id |ts                        |ts1                       |ts2                       |ts3                       |
+---+--------------------------+--------------------------+--------------------------+--------------------------+
|1  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|2  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|3  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|4  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
+---+--------------------------+--------------------------+--------------------------+--------------------------+

*Note - the code assuming SparkSession object already exists via variable name spark

Code snippet

from pyspark.sql.functions import *
import datetime

now = datetime.datetime.now()
df = spark.range(1,5)
df = df.withColumn('ts', lit(now))
df = df.withColumn('ts1', expr("ts - interval '0.001' seconds"))
df = df.withColumn('ts2', expr("ts + interval '0.001' seconds"))
df = df.withColumn('ts3', expr("ts + interval '1' seconds"))
df.show(truncate=False)
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts