Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
Scala - Add Constant Column to Spark Data Frame
insights Stats
warning Please login first to view stats information.
Raymond
Spark & PySpark
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
This article shows how to add a constant or literal column to Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a Spark data frame.
+----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
Add constant column via lit function
Function lit can be used to add columns with constant value as the following code snippet shows:
df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show()
Two new columns are added.
Output:
scala> df.withColumn("ConstantColumn1", lit(1)).withColumn("ConstantColumn2", lit(java.time.LocalDate.now)).show() +----------+-----+------------------+---------------+---------------+ | Category|Count| Description|ConstantColumn1|ConstantColumn2| +----------+-----+------------------+---------------+---------------+ |Category A| 100|This is category A| 1| 2020-12-14| |Category B| 120|This is category B| 1| 2020-12-14| |Category C| 150|This is category C| 1| 2020-12-14| +----------+-----+------------------+---------------+---------------+
Other approaches
UDF or Spark SQL can be used to add constant values too.
The following are some examples.
# Add new constant column via Spark SQL df.createOrReplaceTempView("df") spark.sql( "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show() # Add new constant column via UDF val constantFunc = udf(()=> 1) df.withColumn("ConstantColumn1", constantFunc()).show()
Output:
scala> df.createOrReplaceTempView("df") spark.sql( scala> | "select *, 1 as ConstantColumn1, current_date as ConstantColumn2 from df").show() +----------+-----+------------------+---------------+---------------+ | Category|Count| Description|ConstantColumn1|ConstantColumn2| +----------+-----+------------------+---------------+---------------+ |Category A| 100|This is category A| 1| 2020-12-14| |Category B| 120|This is category B| 1| 2020-12-14| |Category C| 150|This is category C| 1| 2020-12-14| +----------+-----+------------------+---------------+---------------+ scala> df.withColumn("ConstantColumn1", constantFunc()).show() +----------+-----+------------------+---------------+ | Category|Count| Description|ConstantColumn1| +----------+-----+------------------+---------------+ |Category A| 100|This is category A| 1| |Category B| 120|This is category B| 1| |Category C| 150|This is category C| 1| +----------+-----+------------------+---------------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 4 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.