Scala: Remove Columns from Spark Data Frame

access_time 2 months ago visibility17 comment 0

This article shows how to 'remove' column from Spark data frame using Scala. 

Construct a dataframe 

Follow article Scala: Convert List to Spark Data Frame to construct a data frame.

The DataFrame object looks like the following: 

+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

'Delete' or 'Remove' one column

The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated. 

We can use drop function to remove or delete columns from a DataFrame.

scala> df.drop("Category").show()
+-----+------------------+
|Count|       Description|
+-----+------------------+
|  100|This is category A|
|  120|This is category B|
|  150|This is category C|
+-----+------------------+

Drop multiple columns

Multiple columns can be dropped at the same time:

val columns_to_drop = Array("Category", "Count")
df.drop(columns_to_drop: _*).show()
df.drop("Category", "Description").show()
Output:
scala> df.drop(columns_to_drop: _*).show()
+------------------+
|       Description|
+------------------+
|This is category A|
|This is category B|
|This is category C|
+------------------+

scala> df.drop("Category", "Description").show()
+-----+
|Count|
+-----+
|  100|
|  120|
|  150|
+-----+

The above code snippets shows two approaches to drop column - specified column names or dynamic array or column names. 

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Raymond 1 month ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 33268
thumb_up 0
access_time 2 years ago

In Spark, SparkContext.parallelize function can be used to convert Python list to RDD and then RDD can be converted to DataFrame object. The following sample code is based on Spark 2.x. In this page, I am going to show you how to convert the following list to a data frame: data = [('Category A' ...

visibility 1598
thumb_up 0
access_time 2 years ago

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

visibility 12
thumb_up 0
access_time 30 days ago

In article  Scala: Parse JSON String as Spark DataFrame , it shows how to convert an in-memory JSON string object to a Spark DataFrame. This article shows how to read directly from a JSON file. In fact, this is even simpler.  The following code snippet reads from a local JSON file named ...