Scala - Add Constant Column to Spark Data Frame
This article shows how to add a constant or literal column to Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.
+----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
Add constant column via lit function
Function lit can be used to add columns with constant value as the following code snippet shows:
df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
Two new columns are added.
Output:
scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show() +----------+-----+------------------+---------------+---------------+ | Category|Count| Description|ConstantColumn1|ConstantColumn2| +----------+-----+------------------+---------------+---------------+ |Category A| 100|This is category A| 1| 2020-12-14| |Category B| 120|This is category B| 1| 2020-12-14| |Category C| 150|This is category C| 1| 2020-12-14| +----------+-----+------------------+---------------+---------------+
Other approaches
UDF or Spark SQL can be used to add constant values too.
The following are some examples.
# Add new constant column via Spark SQL df.createOrReplaceTempView("df") spark.sql( "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show() # Add new constant column via UDF val constantFunc = udf(()=> 1) df.withColumn("ConstantColumn1", constantFunc()).show()
Output:
scala> df.createOrReplaceTempView("df") spark.sql( scala> | "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show() +----------+-----+------------------+---------------+---------------+ | Category|Count| Description|ConstantColumn1|ConstantColumn2| +----------+-----+------------------+---------------+---------------+ |Category A| 100|This is category A| 1| 2020-12-14| |Category B| 120|This is category B| 1| 2020-12-14| |Category C| 150|This is category C| 1| 2020-12-14| +----------+-----+------------------+---------------+---------------+ scala> df.withColumn("ConstantColumn1", constantFunc()).show() +----------+-----+------------------+---------------+ | Category|Count| Description|ConstantColumn1| +----------+-----+------------------+---------------+ |Category A| 100|This is category A| 1| |Category B| 120|This is category B| 1| |Category C| 150|This is category C| 1| +----------+-----+------------------+---------------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
warning Please login first to view stats information.
article
Data Partition in Spark (PySpark) In-depth Walkthrough
article
Apache Spark 2.4.3 Installation on Windows 10 using Windows Subsystem for Linux
article
Implement SCD Type 2 Full Merge via Spark Data Frames
article
Scala: Remove Columns from Spark Data Frame
article
Pass Environment Variables to Executors in PySpark
Read more (127)