Spark SQL - Literals (Constants)

Kontext Kontext visibility 1,397 event 2022-05-31 access_time 2 years ago language English
more_vert
A Spark literal is a constant the represents a fixed data value. Literals are commonly used in SQL, for example, to define a default value, to create a column with constant value, etc. Spark SQL supports 7 types of literals - string, binary, null, boolean, numeric, datetime and interval literals. 

This page provide some examples about how to use them in Spark SQL. All the code snippets can run using Spark SQL CLI (spark-sql) or Spark code via SparkSession.sql API.

String literals

You can define a string literal to specify a character string value. 

SELECT 'A string literal' as Col;
SELECT "A string literal" as Col;

You can use '\' to escape special characters:

SELECT "A string \" literal" as Col;

It can also be used to define unicode characters (\u represents 16 bit UTF-16 and \U presents 32 bit UTF-32):

SELECT "\u0024" as Col;

Boolean literals

TRUE or FALSE can be used to specify boolean values:
SELECT TRUE as Col1, FLASE as Col2;
*Case is not sensitive.

Null literals

Null values can be defined using NULL directly:
SELECT NULL as Col;

Binary literals

Binrary literals can be defined using the following syntax:
X { 'num [ ... ]' | "num [ ... ]" }
For example
SELECT X '1234' as Col1, X"121F" as Col2;

Integer literals

Numeric literals can be commonly used in data calculations. The syntax to define is:

[ + | - ] digit [ ... ] [ L | S | Y ]

By default (no postfix), it presents a signed 4 bytes integer number, for example 2022.

  • digit - Any numeral from 0 to 9.

  • L - Case insensitive, indicates BIGINT, which is an 8-byte signed integer number.

  • S - Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number.

  • Y - Case insensitive, indicates TINYINT, which is a 1-byte signed integer number.

The following are some examples:

SELECT -1024l as Col;
SELECT 12S as Col;
SELECT 4Y as Col;

Fractional numeric literals

You can define decimal and exaptational numeric literals using the following syntaxes.

decimal literals:

decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD

double literals:

decimal_digits  { D | exponent [ D ] }  | digit [ ... ] { exponent [ D ] | [ exponent ] D }

float literals:

decimal_digits  { F | exponent [ F ] }  | digit [ ... ] { exponent [ F ] | [ exponent ] F }

While decimal_digits is defined as

[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] }

and exponent is defined as

E [ + | - ] digit [ ... ]

and digit represents any numeral from 0 to 9; D or d, indicates DOUBLE (8-byte double precision floating point number); F or f indicates FLOAT (4-byte single precision floating point number); BD or bd indicates DECIMAL.

The following are some examples for your reference:

SELECT 2033.BD as Col;
SELECT 2022D as Col;
SELECT -.2f as Col;
SELECT 3.14 e-2d as Col;

Date literals

Date literals can be specified using the following syntax:

DATE { 'yyyy' |
       'yyyy-[m]m' |
       'yyyy-[m]m-[d]d' |
       'yyyy-[m]m-[d]d[T]' }

If month or day component is not specified, it defaults to 01.

Examples:

SELECT DATE '2022' as Col;
SELECT DATE '2022-01' as Col;
SELECT DATE '2022-1-1' as Col;

Timestamp literals

Timestamp literals can be created using the following syntax:

TIMESTAMP { 'yyyy' |
            'yyyy-[m]m' |
            'yyyy-[m]m-[d]d' |
            'yyyy-[m]m-[d]d ' |
            'yyyy-[m]m-[d]d[T][h]h[:]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m[:]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s[.]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]'}

If month or day component is not specified, it defaults to 01; if hour, minute or seconds are not specified, it defaults to 00.

Examples:

SELECT TIMESTAMP '2022-01-01 01:01:01.66666666UTC+10:00' AS Col;
SELECT TIMESTAMP '2022-06' as Col;

Interval literals

Intervals can be used to add intervals to a date or timestamp. Interval literals can be used to specify a fixed period of time. in Spark SQL, you can use either ANSI syntax of multi-units syntax.

For more details about the syntax, refer to official documentation Literals - Spark 3.2.1 Documentation (apache.org)

The following are some examples.

ANSI syntax examples

INTERVAL '-1-2' YEAR TO MONTH AS Col;
INTERVAL '123:10' HOUR TO MINUTE as Col;

Multi-units syntax examples

SELECT INTERVAL '1 YEAR 2 DAYS 3 HOURS';
SELECT INTERVAL '1 YEAR 2 MONTHS 3 WEEKS 4 DAYS 5 HOURS';

I personally found multi-units syntax is easier to understand.

More from Kontext
copyright This page is subject to Site terms.
Like this article?
Share on
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts