Raymond
Scala: Remove Columns from Spark Data Frame
insights Stats
warning Please login first to view stats information.
toc Table of contents
This article shows how to 'remove' column from Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a data frame.
The DataFrame object looks like the following:
+----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use drop function to remove or delete columns from a DataFrame.
scala> df.drop("Category").show() +-----+------------------+ |Count| Description| +-----+------------------+ | 100|This is category A| | 120|This is category B| | 150|This is category C| +-----+------------------+
Drop multiple columns
Multiple columns can be dropped at the same time:
val columns_to_drop = Array("Category", "Count") df.drop(columns_to_drop: _*).show() df.drop("Category", "Description").show()
Output:
scala> df.drop(columns_to_drop: _*).show() +------------------+ | Description| +------------------+ |This is category A| |This is category B| |This is category C| +------------------+ scala> df.drop("Category", "Description").show() +-----+ |Count| +-----+ | 100| | 120| | 150| +-----+
The above code snippets shows two approaches to drop column - specified column names or dynamic array or column names.
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 4 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.