access_time 8 months ago languageEnglish
more_vert

Scala: Change Data Frame Column Names in Spark

visibility 2,786 comment 0

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Scala. 

infoThis is the Scala version of article: Change DataFrame Column Names in PySpark

Construct a dataframe 

The following code snippet creates a DataFrame from an array of Scala list. Spark SQL types are used to create the schema and then SparkSession.createDataFrame function is used to convert the array of list to a Spark DataFrame object. 

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val data = 
Array(List("Category A", 100, "This is category A"),
List("Category B", 120, "This is category B"),
List("Category C", 150, "This is category C"))

// Create a schema for the dataframe
val schema =
  StructType(
    StructField("Category", StringType, true) ::
    StructField("Count", IntegerType, true) ::
    StructField("Description", StringType, true) :: Nil)

// Convert list to List of Row
val rows = data.map(t=>Row(t(0),t(1),t(2))).toList

// Create RDD
val rdd = spark.sparkContext.parallelize(rows)

// Create data frame
val df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()

The data frame looks like the following:

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

Print out column names

DataFrame.columns can be used to print out column list of the data frame:

print(df.columns.toList)

Output:

List(Category, Count, Description)

Rename one column

We can use withColumnRenamed function to change column names.

val df2 = df.withColumnRenamed("Category", "category_new")
df2.show()

Output:

scala> df2.show()
+------------+-----+------------------+
|category_new|Count|       Description|
+------------+-----+------------------+
|  Category A|  100|This is category A|
|  Category B|  120|This is category B|
|  Category C|  150|This is category C|
+------------+-----+------------------+

Column Category is renamed to category_new.

Rename all columns

Function toDF can be used to rename all column names. The following code snippet converts all column names to lower case and then append '_new' to each column name.

# Rename columns
val new_column_names=df.columns.map(c=>c.toLowerCase() + "_new")
val df3 = df.toDF(new_column_names:_*)
df3.show()

Output:

scala> df3.show()
+------------+---------+------------------+
|category_new|count_new|   description_new|
+------------+---------+------------------+
|  Category A|      100|This is category A|
|  Category B|      120|This is category B|
|  Category C|      150|This is category C|
+------------+---------+------------------+

You can use similar approach to remove spaces or special characters from column names.

infoIn Scala, _* is used to unpack a list or array. For this example, the parameter is String*. 

Use Spark SQL

Of course, you can also use Spark SQL to rename columns like the following code snippet shows:

df.createOrReplaceTempView("df")
spark.sql("select Category as category_new, Count as count_new, Description as description_new from df").show()

The above code snippet first register the dataframe as a temp view. And then Spark SQL is used to change column names.

Output:

scala> spark.sql("select Category as category_new, Count as count_new, Description as description_new from df").show()
+------------+---------+------------------+
|category_new|count_new|   description_new|
+------------+---------+------------------+
|  Category A|      100|This is category A|
|  Category B|      120|This is category B|
|  Category C|      150|This is category C|
+------------+---------+------------------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Raymond 8 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn.

Want to contribute on Kontext to help others?

Learn more

More from Kontext

visibility 529
thumb_up 0
access_time 7 months ago
visibility 4301
thumb_up 0
access_time 4 years ago
visibility 2727
thumb_up 0
access_time 4 years ago