PySpark DataFrame - Add Column using withColumn

event 2022-07-18 visibility 319 comment 0 insights
more_vert
insights Stats
Kontext Kontext Code Snippets & Tips

Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise. 

New columns can be added to Spark DataFrame using withColumn method. This include constant columns or columns derived using existing columns.

Code snippet

The following script shows how to add a new column by deriving from existing columns.

from pyspark.sql import SparkSession

appName = "PySpark DataFrame - withColumn function"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

spark.sparkContext.setLogLevel('WARN')

data = [{"a": "100", "b": "200"},
        {"a": "1000", "b": "2000"}]

df = spark.createDataFrame(data)
df.show()

df = df.withColumn('a+b', df.a + df.b)

df.show()

Output:

+----+----+------+
|   a|   b|   a+b|
+----+----+------+
| 100| 200| 300.0|
|1000|2000|3000.0|
+----+----+------+

You can use any supported Spark SQL functions when deriving the new column. 

Add column with constants

If the purpose is to add constant columns, refer to Add Constant Column to PySpark DataFrame. For literals you can create, refer to Spark SQL - Literals (Constants).

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts