Scala - Add Constant Column to Spark Data Frame

event 2020-12-14 visibility 2,028 comment 0 insights
more_vert
insights Stats
Raymond Raymond Spark & PySpark

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.


This article shows how to add a constant or literal column to Spark data frame using Scala. 

Construct a dataframe 

Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

Add constant column via lit function

Function lit can be used to add columns with constant value as the following code snippet shows:

df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()

Two new columns are added. 

Output:

scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

Other approaches

UDF or Spark SQL can be used to add constant values too.

The following are some examples. 

# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
    "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()

# Add new constant column via UDF
val constantFunc = udf(()=> 1)
df.withColumn("ConstantColumn1", constantFunc()).show()

Output:

scala> df.createOrReplaceTempView("df")
spark.sql(

scala>      |     "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()
+----------+-----+------------------+---------------+---------------+
|  Category|Count|       Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A|  100|This is category A|              1|     2020-12-14|
|Category B|  120|This is category B|              1|     2020-12-14|
|Category C|  150|This is category C|              1|     2020-12-14|
+----------+-----+------------------+---------------+---------------+

scala> df.withColumn("ConstantColumn1", constantFunc()).show()
+----------+-----+------------------+---------------+
|  Category|Count|       Description|ConstantColumn1|
+----------+-----+------------------+---------------+
|Category A|  100|This is category A|              1|
|Category B|  120|This is category B|              1|
|Category C|  150|This is category C|              1|
+----------+-----+------------------+---------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts