Spark SQL - Concatenate w/o Separator (concat_ws and concat)

Kontext Kontext event 2022-07-09 visibility 12,107
more_vert

Spark SQL provides two built-in functions: concat and concat_ws. The former can be used to concatenate columns in a table (or a Spark DataFrame) directly without separator while the latter can be used to concatenate with a separator. 

Use concat function

The following code snippet shows examples of using concat functions.

spark-sql> select concat(array(1,2,3),array(4), array(5,6));
[1,2,3,4,5,6]

spark-sql> select concat('Hello ', 'Kontext', '!');
Hello Kontext!

You can replace the constants (literals) with column names in the above SQL statements.

Use concat_ws function

If you want to add a separator, concat_ws function can be used. The syntax of this function is defined as:

concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep.

The following example use this function to concatenate strings.

spark-sql> select concat_ws('|','A','B','C');
A|B|C
spark-sql> select concat_ws('|',array('A','B','C'), 'D');
A|B|C|D

Concatenate with NULL values

When NULL values exist in concinnated columns or literals, Spark SQL will just ignore it. 

spark-sql> select concat_ws('|',array('A','B','C'), NULL, 'D');
A|B|C|D

This can cause unexpected behaviors if you want to concatenate columns to create unique values. To fix that problem, you can convert NULL values to a constant string before concatenating. 

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts