access_time 5 months ago languageEnglish
more_vert

Scala - Add Constant Column to Spark Data Frame

visibility 118 comment 0

This article shows how to add a constant or literal column to Spark data frame using Scala. 

Construct a dataframe 

Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()

Two new columns are added. 

Output:

scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
val constantFunc = udf(()=> 1)
df.withColumn("ConstantColumn1", constantFunc()).show()

Output:

scala> df.createOrReplaceTempView("df")
spark.sql(

scala>      |     "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

scala> df.withColumn("ConstantColumn1", constantFunc()).show()
+----------+-----+------------------+---------------+
|  Category|Count|       Description|ConstantColumn1|
+----------+-----+------------------+---------------+
|Category A|  100|This is category A|              1|
|Category B|  120|This is category B|              1|
|Category C|  150|This is category C|              1|
+----------+-----+------------------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Raymond 5 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to contribute on Kontext to help others?

Learn more

More from Kontext