🚀 News: We are launching the Kontext Labs Platform Pilot! Click here to join our pilot program.

Concatenate Columns in Spark DataFrame

Code description

This code snippet provides one example of concatenating columns using a separator in Spark DataFrame. Function concat_ws is used directly. For Spark SQL version, refer to Spark SQL - Concatenate w/o Separator (concat_ws and concat).

Syntax of concat_ws

    pyspark.sql.functions.concat_ws(sep: str, *cols: ColumnOrName)

Output:

    +-----+--------+--------------+
    | col1|    col2|     col1_col2|
    +-----+--------+--------------+
    |Hello| Kontext| Hello,Kontext|
    |Hello|Big Data|Hello,Big Data|
    +-----+--------+--------------+  
    

Code snippet

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import concat_ws
    
    app_name = "PySpark concat_ws Example"
    master = "local"
    
    spark = SparkSession.builder         .appName(app_name)         .master(master)         .getOrCreate()
    
    spark.sparkContext.setLogLevel("WARN")
    
    # Create a DataFrame
    df = spark.createDataFrame(
        [['Hello', 'Kontext'], ['Hello', 'Big Data']], ['col1', 'col2'])
    
    # Concatenate these two columns using seperator ','
    df = df.withColumn('col1_col2', concat_ws(',', df.col1, df.col2))
    
    df.show()