Add Constant Column to PySpark DataFrame
This article shows how to add a constant or literal column to Spark data frame using Python.
Construct a dataframe
Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.
+----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.40| |Category B| 2| 30.10| |Category C| 3|100.01| +----------+---+------+
Add constant column via lit function
Function lit can be used to add columns with constant value as the following code snippet shows:
from datetime import date from pyspark.sql.functions import lit df1 = df.withColumn('ConstantColumn1', lit(1)).withColumn( 'ConstantColumn2', lit(date.today())) df1.show()
Two new columns are added.
Output:
+----------+---+------+---------------+---------------+ | Category| ID| Value|ConstantColumn1|ConstantColumn2| +----------+---+------+---------------+---------------+ |Category A| 1| 12.40| 1| 2020-08-09| |Category B| 2| 30.10| 1| 2020-08-09| |Category C| 3|100.01| 1| 2020-08-09| +----------+---+------+---------------+---------------+
Other approaches
UDF or Spark SQL can be used to add constant values too.
The following are some examples.
# Add new constant column via Spark SQL df.createOrReplaceTempView("df") spark.sql( "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show() # Add new constant column via UDF from pyspark.sql.functions import udf @udf("int") def const_col(): return 1 df1 = df.withColumn('ConstantColumn1', const_col()) df1.show()
Output:
+----------+---+------+---------------+---------------+ | Category| ID| Value|ConstantColumn1|ConstantColumn2| +----------+---+------+---------------+---------------+ |Category A| 1| 12.40| 1| 2020-08-09| |Category B| 2| 30.10| 1| 2020-08-09| |Category C| 3|100.01| 1| 2020-08-09| +----------+---+------+---------------+---------------+ +----------+---+------+---------------+ | Category| ID| Value|ConstantColumn1| +----------+---+------+---------------+ |Category A| 1| 12.40| 1| |Category B| 2| 30.10| 1| |Category C| 3|100.01| 1| +----------+---+------+---------------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 4 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.