Spark SQL - Literals (Constants)
This page provide some examples about how to use them in Spark SQL. All the code snippets can run using Spark SQL CLI (spark-sql
) or Spark code via SparkSession.sql
API.
String literals
You can define a string literal to specify a character string value.
SELECT 'A string literal' as Col; SELECT "A string literal" as Col;
You can use '\' to escape special characters:
SELECT "A string \" literal" as Col;
It can also be used to define unicode characters (\u represents 16 bit UTF-16 and \U presents 32 bit UTF-32):
SELECT "\u0024" as Col;
Boolean literals
SELECT TRUE as Col1, FLASE as Col2;
Null literals
SELECT NULL as Col;
Binary literals
X { 'num [ ... ]' | "num [ ... ]" }
SELECT X '1234' as Col1, X"121F" as Col2;
Integer literals
Numeric literals can be commonly used in data calculations. The syntax to define is:
[ + | - ] digit [ ... ] [ L | S | Y ]
By default (no postfix), it presents a signed 4 bytes integer number, for example 2022.
digit - Any numeral from 0 to 9.
L - Case insensitive, indicates BIGINT, which is an 8-byte signed integer number.
S - Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number.
Y - Case insensitive, indicates TINYINT, which is a 1-byte signed integer number.
The following are some examples:
SELECT -1024l as Col; SELECT 12S as Col; SELECT 4Y as Col;
Fractional numeric literals
You can define decimal and exaptational numeric literals using the following syntaxes.
decimal literals:
decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD
double literals:
decimal_digits { D | exponent [ D ] } | digit [ ... ] { exponent [ D ] | [ exponent ] D }
float literals:
decimal_digits { F | exponent [ F ] } | digit [ ... ] { exponent [ F ] | [ exponent ] F }
While decimal_digits
is defined as
[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] }
and exponent
is defined as
E [ + | - ] digit [ ... ]
and digit
represents any numeral from 0 to 9; D
or d
, indicates DOUBLE
(8-byte double precision floating point number); F
or f
indicates FLOAT
(4-byte single precision floating point number); BD
or bd
indicates DECIMAL
.
The following are some examples for your reference:
SELECT 2033.BD as Col; SELECT 2022D as Col; SELECT -.2f as Col; SELECT 3.14 e-2d as Col;
Date literals
Date literals can be specified using the following syntax:
DATE { 'yyyy' | 'yyyy-[m]m' | 'yyyy-[m]m-[d]d' | 'yyyy-[m]m-[d]d[T]' }
If month or day component is not specified, it defaults to 01.
Examples:
SELECT DATE '2022' as Col; SELECT DATE '2022-01' as Col; SELECT DATE '2022-1-1' as Col;
Timestamp literals
Timestamp literals can be created using the following syntax:
TIMESTAMP { 'yyyy' | 'yyyy-[m]m' | 'yyyy-[m]m-[d]d' | 'yyyy-[m]m-[d]d ' | 'yyyy-[m]m-[d]d[T][h]h[:]' | 'yyyy-[m]m-[d]d[T][h]h:[m]m[:]' | 'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s[.]' | 'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]'}
If month or day component is not specified, it defaults to 01; if hour, minute or seconds are not specified, it defaults to 00.
Examples:
SELECT TIMESTAMP '2022-01-01 01:01:01.66666666UTC+10:00' AS Col; SELECT TIMESTAMP '2022-06' as Col;
Interval literals
Intervals can be used to add intervals to a date or timestamp. Interval literals can be used to specify a fixed period of time. in Spark SQL, you can use either ANSI syntax of multi-units syntax.
For more details about the syntax, refer to official documentation Literals - Spark 3.2.1 Documentation (apache.org).
The following are some examples.
ANSI syntax examples
INTERVAL '-1-2' YEAR TO MONTH AS Col; INTERVAL '123:10' HOUR TO MINUTE as Col;
Multi-units syntax examples
SELECT INTERVAL '1 YEAR 2 DAYS 3 HOURS'; SELECT INTERVAL '1 YEAR 2 MONTHS 3 WEEKS 4 DAYS 5 HOURS';
I personally found multi-units syntax is easier to understand.