Spark SQL - Date and Timestamp Truncate Functions
Spark SQL provides two very useful date and timestamp truncation functions: trunc
and date_trunc
. The former can be used to truncate date or valid date string while the later can work with datetime or valid timestamp string. Example use cases include getting the first date of the week, first date of the month, first date of the quarter, first date of the year, etc.
Truncate date via trunc function
Syntax
trunc(date, fmt)
This function returns date with the time portion of the day truncated to the unit specified by the format model fmt
.
The possible values for fmt
are:
- "YEAR", "YYYY", "YY" - truncate to the first date of the year that the date falls in
- "QUARTER" - truncate to the first date of the quarter that the date falls in
- "MONTH", "MM", "MON" - truncate to the first date of the month that the date falls in
- "WEEK" - truncate to the Monday of the week that the date falls in
This argument is case-insensitive.
Code snippets
spark-sql> select trunc(DATE'2022-06-15','YEAR'); 2022-01-01 spark-sql> select trunc(DATE'2022-06-15','quarter'); 2022-04-01 spark-sql> select trunc(DATE'2022-06-15','MON'); 2022-06-01 spark-sql> select trunc(DATE'2022-06-15','week'); 2022-06-13
Truncate datetime via date_trunc function
Syntax
date_trunc(fmt, ts)
This function returns timestamp ts
that truncates to the unit specified by parameter fmt
.
The possible values for argument fmt
are:
- "YEAR", "YYYY", "YY" - truncate to the first date of the year that the
ts
falls in, the time part will be zero out - "QUARTER" - truncate to the first date of the quarter that the
ts
falls in, the time part will be zero out - "MONTH", "MM", "MON" - truncate to the first date of the month that the
ts
falls in, the time part will be zero out - "WEEK" - truncate to the Monday of the week that the
ts
falls in, the time part will be zero out - "DAY", "DD" - zero out the time part
- "HOUR" - zero out the minute and second with fraction part
- "MINUTE"- zero out the second with fraction part
- "SECOND" - zero out the second fraction part
- "MILLISECOND" - zero out the microseconds
- "MICROSECOND" - no truncation
This argument is case-insensitive.
Code snippets
spark-sql> select date_trunc('YYYY', timestamp'2022-06-15 23:07:49.178'); 2022-01-01 00:00:00 spark-sql> select date_trunc('MM', timestamp'2022-06-15 23:07:49.178'); 2022-06-01 00:00:00 spark-sql> select date_trunc('DAY', timestamp'2022-06-15 23:07:49.178'); 2022-06-15 00:00:00 spark-sql> select date_trunc('WEEK', timestamp'2022-06-15 23:07:49.178'); 2022-06-13 00:00:00 spark-sql> select date_trunc('hour', timestamp'2022-06-15 23:07:49.178'); 2022-06-15 23:00:00 spark-sql> select date_trunc('minute', timestamp'2022-06-15 23:07:49.178'); 2022-06-15 23:07:00 spark-sql> select date_trunc('quarter', timestamp'2022-06-15 23:07:49.178'); 2022-04-01 00:00:00