Delete or Remove Columns from PySpark DataFrame
This article shows how to 'delete' column from Spark data frame using Python.
Construct a dataframe
Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.
+----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.40| |Category B| 2| 30.10| |Category C| 3|100.01| +----------+---+------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use drop function to remove or delete columns from a DataFrame.
df1 = df.drop('Category') df1.show()
Output:
+---+------+ | ID| Value| +---+------+ | 1| 12.40| | 2| 30.10| | 3|100.01| +---+------+
Drop multiple columns
Multiple columns can be dropped at the same time:
df2 = df.drop('Category', 'ID') df2.show() columns_to_drop = ['Category', 'ID'] df3 = df.drop(*columns_to_drop) df3.show()
Output:
+------+ | Value| +------+ | 12.40| | 30.10| |100.01| +------+ +------+ | Value| +------+ | 12.40| | 30.10| |100.01| +------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 5 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.