Delete or Remove Columns from PySpark DataFrame

access_time 4 months ago visibility818 comment 0

This article shows how to 'delete' column from Spark data frame using Python. 

Construct a dataframe 

Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.

+----------+---+------+
|  Category| ID| Value|
+----------+---+------+
|Category A|  1| 12.40|
|Category B|  2| 30.10|
|Category C|  3|100.01|
+----------+---+------+

'Delete' or 'Remove' one column

The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated. 

We can use drop function to remove or delete columns from a DataFrame.

df1 = df.drop('Category')
df1.show()

Output:

+---+------+
| ID| Value|
+---+------+
|  1| 12.40|
|  2| 30.10|
|  3|100.01|
+---+------+

Drop multiple columns

Multiple columns can be dropped at the same time:

df2 = df.drop('Category', 'ID')
df2.show()

columns_to_drop = ['Category', 'ID']
df3 = df.drop(*columns_to_drop)
df3.show()
Output:
+------+
| Value|
+------+
| 12.40|
| 30.10|
|100.01|
+------+

+------+
| Value|
+------+
| 12.40|
| 30.10|
|100.01|
+------+

Run Spark code

You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet:

info Last modified by Administrator at 4 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer SQL Server local_offer python local_offer spark local_offer pyspark local_offer spark-database-connect

visibility 24005
thumb_up 4
access_time 2 years ago

Spark is an analytics engine for big data processing. There are various ways to connect to a database in Spark. This page summarizes some of common approaches to connect to SQL Server using Python as programming language. For each method, both Windows Authentication and SQL Server ...

local_offer pyspark local_offer spark local_offer spark-2-x local_offer spark-file-operations

visibility 16192
thumb_up 0
access_time 13 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in ...

local_offer spark local_offer pyspark local_offer how-to local_offer tutorial local_offer spark-dataframe

visibility 4728
thumb_up 1
access_time 4 months ago

This article shows you how to filter NULL/None values from a Spark data frame using Python. Function DataFrame.filter or DataFrame.where can be used to filter out null values.

About column

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

rss_feed Subscribe RSS