This article shows how to add a constant or literal column to Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.
+----------+-----+------------------+
| Category|Count| Description|
+----------+-----+------------------+
|Category A| 100|This is category A|
|Category B| 120|This is category B|
|Category C| 150|This is category C|
+----------+-----+------------------+
Add constant column via lit function
Function lit can be used to add columns with constant value as the following code snippet shows:
df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
Two new columns are added.
Output:
scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
+----------+-----+------------------+---------------+---------------+
| Category|Count| Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A| 100|This is category A| 1| 2020-12-14|
|Category B| 120|This is category B| 1| 2020-12-14|
|Category C| 150|This is category C| 1| 2020-12-14|
+----------+-----+------------------+---------------+---------------+
Other approaches
UDF or Spark SQL can be used to add constant values too.
The following are some examples.
# Add new constant column via Spark SQL
df.createOrReplaceTempView("df")
spark.sql(
"select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()
# Add new constant column via UDF
val constantFunc = udf(()=> 1)
df.withColumn("ConstantColumn1", constantFunc()).show()
Output:
scala> df.createOrReplaceTempView("df")
spark.sql(
scala> | "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show()
+----------+-----+------------------+---------------+---------------+
| Category|Count| Description|ConstantColumn1|ConstantColumn2|
+----------+-----+------------------+---------------+---------------+
|Category A| 100|This is category A| 1| 2020-12-14|
|Category B| 120|This is category B| 1| 2020-12-14|
|Category C| 150|This is category C| 1| 2020-12-14|
+----------+-----+------------------+---------------+---------------+
scala> df.withColumn("ConstantColumn1", constantFunc()).show()
+----------+-----+------------------+---------------+
| Category|Count| Description|ConstantColumn1|
+----------+-----+------------------+---------------+
|Category A| 100|This is category A| 1|
|Category B| 120|This is category B| 1|
|Category C| 150|This is category C| 1|
+----------+-----+------------------+---------------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet: