Scala: Remove Columns from Spark Data Frame

Raymond Tang Raymond Tang 0 4645 2.80 index 12/13/2020

This article shows how to 'remove' column from Spark data frame using Scala.

Construct a dataframe

Follow article Scala: Convert List to Spark Data Frame to construct a data frame.

The DataFrame object looks like the following:

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

'Delete' or 'Remove' one column

The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.

We can use dropfunction to remove or delete columns from a DataFrame.

scala> df.drop("Category").show()
+-----+------------------+
|Count|       Description|
+-----+------------------+
|  100|This is category A|
|  120|This is category B|
|  150|This is category C|
+-----+------------------+

Drop multiple columns

Multiple columns can be dropped at the same time:

val columns_to_drop = Array("Category", "Count")
df.drop(columns_to_drop: _*).show()
df.drop("Category", "Description").show()

Output:

scala> df.drop(columns_to_drop: _*).show()
+------------------+
|       Description|
+------------------+
|This is category A|
|This is category B|
|This is category C|
+------------------+

scala> df.drop("Category", "Description").show()
+-----+
|Count|
+-----+
|  100|
|  120|
|  150|
+-----+

The above code snippets shows two approaches to drop column - specified column names or dynamic array or column names.

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

how-to scala spark tutorial

Join the Discussion

View or add your thoughts below

Comments