Raymond Raymond

Add Constant Column to PySpark DataFrame

event 2020-08-09 visibility 8,992 comment 0 insights toc
more_vert
insights Stats

This article shows how to add a constant or literal column to Spark data frame using Python. 

Construct a dataframe 

Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.

+----------+---+------+
|  Category| ID| Value|
+----------+---+------+
|Category A|  1| 12.40|
|Category B|  2| 30.10|
|Category C|  3|100.01|
+----------+---+------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

from datetime import date
from pyspark.sql.functions import lit

df1 = df.withColumn('ConstantColumn1', lit(1)).withColumn(
    'ConstantColumn2', lit(date.today()))
df1.show()

Two new columns are added. 

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
from pyspark.sql.functions import udf

@udf("int")
def const_col():
    return 1

df1 = df.withColumn('ConstantColumn1', const_col())
df1.show()

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

+----------+---+------+---------------+
|  Category| ID| Value|ConstantColumn1|
+----------+---+------+---------------+
|Category A|  1| 12.40|              1|
|Category B|  2| 30.10|              1|
|Category C|  3|100.01|              1|
+----------+---+------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts