access_time 8 months ago languageEnglish
more_vert

Scala - Add Constant Column to Spark Data Frame

visibility 287 comment 0

This article shows how to add a constant or literal column to Spark data frame using Scala. 

Construct a dataframe 

Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()

Two new columns are added. 

Output:

scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
val constantFunc = udf(()=> 1)
df.withColumn("ConstantColumn1", constantFunc()).show()

Output:

scala> df.createOrReplaceTempView("df")
spark.sql(

scala>      |     "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

scala> df.withColumn("ConstantColumn1", constantFunc()).show()
+----------+-----+------------------+---------------+
|  Category|Count|       Description|ConstantColumn1|
+----------+-----+------------------+---------------+
|Category A|  100|This is category A|              1|
|Category B|  120|This is category B|              1|
|Category C|  150|This is category C|              1|
+----------+-----+------------------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Raymond 8 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn.

Want to contribute on Kontext to help others?

Learn more

More from Kontext

visibility 6954
thumb_up 0
access_time 10 months ago
.NET for Apache Spark v1.0.0 Released
visibility 57
thumb_up 0
access_time 10 months ago