Spark SQL - Date and Timestamp Truncate Functions

Kontext Kontext event 2022-06-15 visibility 3,989
more_vert

Spark SQL provides two very useful date and timestamp truncation functions: trunc and date_trunc. The former can be used to truncate date or valid date string while the later can work with datetime or valid timestamp string. Example use cases include getting the first date of the week, first date of the month, first date of the quarter,  first date of the year, etc. 

Truncate date via trunc function

Syntax

trunc(date, fmt) 

This function returns date with the time portion of the day truncated to the unit specified by the format model fmt.

The possible values for fmt are:

  • "YEAR", "YYYY", "YY" - truncate to the first date of the year that the date falls in
  • "QUARTER" - truncate to the first date of the quarter that the date falls in
  • "MONTH", "MM", "MON" - truncate to the first date of the month that the date falls in
  • "WEEK" - truncate to the Monday of the week that the date falls in

This argument is case-insensitive. 

Code snippets

spark-sql> select trunc(DATE'2022-06-15','YEAR');
2022-01-01
spark-sql> select trunc(DATE'2022-06-15','quarter');
2022-04-01
spark-sql> select trunc(DATE'2022-06-15','MON');
2022-06-01
spark-sql> select trunc(DATE'2022-06-15','week');
2022-06-13

Truncate datetime via date_trunc function

Syntax

date_trunc(fmt, ts)

This function returns timestamp ts that truncates to the unit specified by parameter fmt

The possible values for argument fmt are:

  • "YEAR", "YYYY", "YY" - truncate to the first date of the year that the ts falls in, the time part will be zero out
  • "QUARTER" - truncate to the first date of the quarter that the ts falls in, the time part will be zero out
  • "MONTH", "MM", "MON" - truncate to the first date of the month that the ts falls in, the time part will be zero out
  • "WEEK" - truncate to the Monday of the week that the ts falls in, the time part will be zero out
  • "DAY", "DD" - zero out the time part
  • "HOUR" - zero out the minute and second with fraction part
  • "MINUTE"- zero out the second with fraction part
  • "SECOND" - zero out the second fraction part
  • "MILLISECOND" - zero out the microseconds
  • "MICROSECOND" - no truncation

This argument is case-insensitive. 

Code snippets

spark-sql> select date_trunc('YYYY', timestamp'2022-06-15 23:07:49.178');
2022-01-01 00:00:00
spark-sql> select date_trunc('MM', timestamp'2022-06-15 23:07:49.178');
2022-06-01 00:00:00
spark-sql> select date_trunc('DAY', timestamp'2022-06-15 23:07:49.178');
2022-06-15 00:00:00
spark-sql> select date_trunc('WEEK', timestamp'2022-06-15 23:07:49.178');
2022-06-13 00:00:00
spark-sql> select date_trunc('hour', timestamp'2022-06-15 23:07:49.178');
2022-06-15 23:00:00
spark-sql> select date_trunc('minute', timestamp'2022-06-15 23:07:49.178');
2022-06-15 23:07:00
spark-sql> select date_trunc('quarter', timestamp'2022-06-15 23:07:49.178');
2022-04-01 00:00:00
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts