PySpark DataFrame - Add Column using withColumn
New columns can be added to Spark DataFrame using withColumn
method. This include constant columns or columns derived using existing columns.
Code snippet
The following script shows how to add a new column by deriving from existing columns.
from pyspark.sql import SparkSession appName = "PySpark DataFrame - withColumn function" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel('WARN') data = [{"a": "100", "b": "200"}, {"a": "1000", "b": "2000"}] df = spark.createDataFrame(data) df.show() df = df.withColumn('a+b', df.a + df.b) df.show()
Output:
+----+----+------+ | a| b| a+b| +----+----+------+ | 100| 200| 300.0| |1000|2000|3000.0| +----+----+------+
You can use any supported Spark SQL functions when deriving the new column.
Add column with constants
If the purpose is to add constant columns, refer to Add Constant Column to PySpark DataFrame. For literals you can create, refer to Spark SQL - Literals (Constants).
copyright
This page is subject to Site terms.
comment Comments
No comments yet.