Raymond
Delete or Remove Columns from PySpark DataFrame
insights Stats
warning Please login first to view stats information.
toc Table of contents
This article shows how to 'delete' column from Spark data frame using Python.
Construct a dataframe
Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.
+----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.40| |Category B| 2| 30.10| |Category C| 3|100.01| +----------+---+------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use drop function to remove or delete columns from a DataFrame.
df1 = df.drop('Category') df1.show()
Output:
+---+------+ | ID| Value| +---+------+ | 1| 12.40| | 2| 30.10| | 3|100.01| +---+------+
Drop multiple columns
Multiple columns can be dropped at the same time:
df2 = df.drop('Category', 'ID') df2.show() columns_to_drop = ['Category', 'ID'] df3 = df.drop(*columns_to_drop) df3.show()
Output:
+------+ | Value| +------+ | 12.40| | 30.10| |100.01| +------+ +------+ | Value| +------+ | 12.40| | 30.10| |100.01| +------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:
info Last modified by Raymond 4 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.