Spark SQL - Concatenate w/o Separator (concat_ws and concat)
Spark SQL provides two built-in functions: concat
and concat_ws
. The former can be used to concatenate columns in a table (or a Spark DataFrame) directly without separator while the latter can be used to concatenate with a separator.
Use concat function
The following code snippet shows examples of using concat
functions.
spark-sql> select concat(array(1,2,3),array(4), array(5,6)); [1,2,3,4,5,6] spark-sql> select concat('Hello ', 'Kontext', '!'); Hello Kontext!
You can replace the constants (literals) with column names in the above SQL statements.
Use concat_ws function
If you want to add a separator, concat_ws
function can be used. The syntax of this function is defined as:
concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep.
The following example use this function to concatenate strings.
spark-sql> select concat_ws('|','A','B','C'); A|B|C spark-sql> select concat_ws('|',array('A','B','C'), 'D'); A|B|C|D
Concatenate with NULL values
When NULL values exist in concinnated columns or literals, Spark SQL will just ignore it.
spark-sql> select concat_ws('|',array('A','B','C'), NULL, 'D'); A|B|C|D
This can cause unexpected behaviors if you want to concatenate columns to create unique values. To fix that problem, you can convert NULL
values to a constant string before concatenating.