This article shows how to 'remove' column from Spark data frame using Scala.
Construct a dataframe
Follow article Scala: Convert List to Spark Data Frame to construct a data frame.
The DataFrame object looks like the following:
+----------+-----+------------------+
| Category|Count| Description|
+----------+-----+------------------+
|Category A| 100|This is category A|
|Category B| 120|This is category B|
|Category C| 150|This is category C|
+----------+-----+------------------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use dropfunction to remove or delete columns from a DataFrame.
scala> df.drop("Category").show()
+-----+------------------+
|Count| Description|
+-----+------------------+
| 100|This is category A|
| 120|This is category B|
| 150|This is category C|
+-----+------------------+
Drop multiple columns
Multiple columns can be dropped at the same time:
val columns_to_drop = Array("Category", "Count")
df.drop(columns_to_drop: _*).show()
df.drop("Category", "Description").show()
Output:
scala> df.drop(columns_to_drop: _*).show()
+------------------+
| Description|
+------------------+
|This is category A|
|This is category B|
|This is category C|
+------------------+
scala> df.drop("Category", "Description").show()
+-----+
|Count|
+-----+
| 100|
| 120|
| 150|
+-----+
The above code snippets shows two approaches to drop column - specified column names or dynamic array or column names.
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet: