Add Constant Column to PySpark DataFrame

access_time 3 months ago visibility425 comment 0

This article shows how to add a constant or literal column to Spark data frame using Python. 

Construct a dataframe 

Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.

+----------+---+------+
|  Category| ID| Value|
+----------+---+------+
|Category A|  1| 12.40|
|Category B|  2| 30.10|
|Category C|  3|100.01|
+----------+---+------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

from datetime import date
from pyspark.sql.functions import lit

df1 = df.withColumn('ConstantColumn1', lit(1)).withColumn(
    'ConstantColumn2', lit(date.today()))
df1.show()

Two new columns are added. 

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
from pyspark.sql.functions import udf

@udf("int")
def const_col():
    return 1

df1 = df.withColumn('ConstantColumn1', const_col())
df1.show()

Output:

+----------+---+------+---------------+---------------+
|  Category| ID| Value|ConstantColumn1|ConstantColumn2|
+----------+---+------+---------------+---------------+
|Category A|  1| 12.40|              1|     2020-08-09|
|Category B|  2| 30.10|              1|     2020-08-09|
|Category C|  3|100.01|              1|     2020-08-09|
+----------+---+------+---------------+---------------+

+----------+---+------+---------------+
|  Category| ID| Value|ConstantColumn1|
+----------+---+------+---------------+
|Category A|  1| 12.40|              1|
|Category B|  2| 30.10|              1|
|Category C|  3|100.01|              1|
+----------+---+------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Administrator at 3 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer teradata local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 5182
thumb_up 0
access_time 2 years ago

In my article Connect to Teradata database through Python , I demonstrated about how to use Teradata python package or Teradata ODBC driver to connect to Teradata. In this article, I’m going to show you how to connect to Teradata through JDBC drivers so that you can load data directly into PySpark ...

local_offer tutorial local_offer pyspark local_offer spark local_offer how-to local_offer spark-dataframe

visibility 580
thumb_up 0
access_time 3 months ago

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Python.  The following code snippet creates a DataFrame from a Python native dictionary list. PySpark SQL types are used to create the ...

local_offer teradata local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 5182
thumb_up 0
access_time 2 years ago

In my article Connect to Teradata database through Python , I demonstrated about how to use Teradata python package or Teradata ODBC driver to connect to Teradata. In this article, I’m going to show you how to connect to Teradata through JDBC drivers so that you can load data directly into PySpark ...

About column

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

rss_feed Subscribe RSS