Concatenate Columns in Spark DataFrame

event 2022-08-19 visibility 325 comment 0 insights
more_vert
insights Stats
Kontext Kontext Code Snippets & Tips

Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise. 

Code description

This code snippet provides one example of concatenating columns using a separator in Spark DataFrame. Function concat_ws is used directly. For Spark SQL version, refer to Spark SQL - Concatenate w/o Separator (concat_ws and concat).

Syntax of concat_ws

pyspark.sql.functions.concat_ws(sep: str, *cols: ColumnOrName)

Output:

+-----+--------+--------------+
| col1|    col2|     col1_col2|
+-----+--------+--------------+
|Hello| Kontext| Hello,Kontext|
|Hello|Big Data|Hello,Big Data|
+-----+--------+--------------+

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.functions import concat_ws

app_name = "PySpark concat_ws Example"
master = "local"

spark = SparkSession.builder \
    .appName(app_name) \
    .master(master) \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

# Create a DataFrame
df = spark.createDataFrame(
    [['Hello', 'Kontext'], ['Hello', 'Big Data']], ['col1', 'col2'])

# Concatenate these two columns using seperator ','
df = df.withColumn('col1_col2', concat_ws(',', df.col1, df.col2))

df.show()
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts