Scala: Remove Columns from Spark Data Frame
This article shows how to 'remove' column from Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a data frame.
The DataFrame object looks like the following:
+----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use drop function to remove or delete columns from a DataFrame.
scala> df.drop("Category").show() +-----+------------------+ |Count| Description| +-----+------------------+ | 100|This is category A| | 120|This is category B| | 150|This is category C| +-----+------------------+
Drop multiple columns
Multiple columns can be dropped at the same time:
val columns_to_drop = Array("Category", "Count") df.drop(columns_to_drop: _*).show() df.drop("Category", "Description").show()
scala> df.drop(columns_to_drop: _*).show() +------------------+ | Description| +------------------+ |This is category A| |This is category B| |This is category C| +------------------+ scala> df.drop("Category", "Description").show() +-----+ |Count| +-----+ | 100| | 120| | 150| +-----+
The above code snippets shows two approaches to drop column - specified column names or dynamic array or column names.
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet: