Spark SQL - Literals (Constants)

Kontext Kontext event 2022-05-31 visibility 1,879
more_vert
A Spark literal is a constant the represents a fixed data value. Literals are commonly used in SQL, for example, to define a default value, to create a column with constant value, etc. Spark SQL supports 7 types of literals - string, binary, null, boolean, numeric, datetime and interval literals. 

This page provide some examples about how to use them in Spark SQL. All the code snippets can run using Spark SQL CLI (spark-sql) or Spark code via SparkSession.sql API.

String literals

You can define a string literal to specify a character string value. 

SELECT 'A string literal' as Col;
SELECT "A string literal" as Col;

You can use '\' to escape special characters:

SELECT "A string \" literal" as Col;

It can also be used to define unicode characters (\u represents 16 bit UTF-16 and \U presents 32 bit UTF-32):

SELECT "\u0024" as Col;

Boolean literals

TRUE or FALSE can be used to specify boolean values:
SELECT TRUE as Col1, FLASE as Col2;
*Case is not sensitive.

Null literals

Null values can be defined using NULL directly:
SELECT NULL as Col;

Binary literals

Binrary literals can be defined using the following syntax:
X { 'num [ ... ]' | "num [ ... ]" }
For example
SELECT X '1234' as Col1, X"121F" as Col2;

Integer literals

Numeric literals can be commonly used in data calculations. The syntax to define is:

[ + | - ] digit [ ... ] [ L | S | Y ]

By default (no postfix), it presents a signed 4 bytes integer number, for example 2022.

  • digit - Any numeral from 0 to 9.

  • L - Case insensitive, indicates BIGINT, which is an 8-byte signed integer number.

  • S - Case insensitive, indicates SMALLINT, which is a 2-byte signed integer number.

  • Y - Case insensitive, indicates TINYINT, which is a 1-byte signed integer number.

The following are some examples:

SELECT -1024l as Col;
SELECT 12S as Col;
SELECT 4Y as Col;

Fractional numeric literals

You can define decimal and exaptational numeric literals using the following syntaxes.

decimal literals:

decimal_digits { [ BD ] | [ exponent BD ] } | digit [ ... ] [ exponent ] BD

double literals:

decimal_digits  { D | exponent [ D ] }  | digit [ ... ] { exponent [ D ] | [ exponent ] D }

float literals:

decimal_digits  { F | exponent [ F ] }  | digit [ ... ] { exponent [ F ] | [ exponent ] F }

While decimal_digits is defined as

[ + | - ] { digit [ ... ] . [ digit [ ... ] ] | . digit [ ... ] }

and exponent is defined as

E [ + | - ] digit [ ... ]

and digit represents any numeral from 0 to 9; D or d, indicates DOUBLE (8-byte double precision floating point number); F or f indicates FLOAT (4-byte single precision floating point number); BD or bd indicates DECIMAL.

The following are some examples for your reference:

SELECT 2033.BD as Col;
SELECT 2022D as Col;
SELECT -.2f as Col;
SELECT 3.14 e-2d as Col;

Date literals

Date literals can be specified using the following syntax:

DATE { 'yyyy' |
       'yyyy-[m]m' |
       'yyyy-[m]m-[d]d' |
       'yyyy-[m]m-[d]d[T]' }

If month or day component is not specified, it defaults to 01.

Examples:

SELECT DATE '2022' as Col;
SELECT DATE '2022-01' as Col;
SELECT DATE '2022-1-1' as Col;

Timestamp literals

Timestamp literals can be created using the following syntax:

TIMESTAMP { 'yyyy' |
            'yyyy-[m]m' |
            'yyyy-[m]m-[d]d' |
            'yyyy-[m]m-[d]d ' |
            'yyyy-[m]m-[d]d[T][h]h[:]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m[:]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s[.]' |
            'yyyy-[m]m-[d]d[T][h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]'}

If month or day component is not specified, it defaults to 01; if hour, minute or seconds are not specified, it defaults to 00.

Examples:

SELECT TIMESTAMP '2022-01-01 01:01:01.66666666UTC+10:00' AS Col;
SELECT TIMESTAMP '2022-06' as Col;

Interval literals

Intervals can be used to add intervals to a date or timestamp. Interval literals can be used to specify a fixed period of time. in Spark SQL, you can use either ANSI syntax of multi-units syntax.

For more details about the syntax, refer to official documentation Literals - Spark 3.2.1 Documentation (apache.org)

The following are some examples.

ANSI syntax examples

INTERVAL '-1-2' YEAR TO MONTH AS Col;
INTERVAL '123:10' HOUR TO MINUTE as Col;

Multi-units syntax examples

SELECT INTERVAL '1 YEAR 2 DAYS 3 HOURS';
SELECT INTERVAL '1 YEAR 2 MONTHS 3 WEEKS 4 DAYS 5 HOURS';

I personally found multi-units syntax is easier to understand.

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts