Add Constant Column to PySpark DataFrame

access_time 6 months ago visibility1206 comment 0

This article shows how to add a constant or literal column to Spark data frame using Python. 

Construct a dataframe 

Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.

+----------+---+------+
|  Category| ID| Value|
+----------+---+------+
|Category A|  1| 12.40|
|Category B|  2| 30.10|
|Category C|  3|100.01|
+----------+---+------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

from datetime import date
from pyspark.sql.functions import lit

df1 = df.withColumn('ConstantColumn1', lit(1)).withColumn(
    'ConstantColumn2', lit(date.today()))
df1.show()

Two new columns are added. 

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
from pyspark.sql.functions import udf

@udf("int")
def const_col():
    return 1

df1 = df.withColumn('ConstantColumn1', const_col())
df1.show()

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

+----------+---+------+---------------+
|  Category| ID| Value|ConstantColumn1|
+----------+---+------+---------------+
|Category A|  1| 12.40|              1|
|Category B|  2| 30.10|              1|
|Category C|  3|100.01|              1|
+----------+---+------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Raymond 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext

visibility 7384
thumb_up 0
access_time 2 years ago

In Spark, it’s easy to convert Spark Dataframe to Pandas dataframe through one line of code: df_pd = df.toPandas() In this page, I am going to show you how to convert a list of PySpark row objects to a Pandas data frame. The following code snippets create a data frame with schema as: root ...

visibility 3566
thumb_up 0
access_time 3 years ago

This page shows how to import data from SQL Server into Hadoop via Apache Sqoop. Please follow the link below to install Sqoop in your machine if you don’t have one environment ready. Install Apache Sqoop in Windows Use the following command in Command Prompt, you will be able to find out ...

visibility 1425
thumb_up 0
access_time 2 years ago

This code snippet shows how to convert string to date.