Code python

PySpark DataFrame - drop and dropDuplicates

Kontext Kontext visibility 115 comment 0 access_time 2 years ago language English

descriptionCode description

PySpark DataFrame APIs provide two drop related methods: drop and dropDuplicates (or drop_duplicates). The former is used to drop specified column(s) from a DataFrame while the latter is used to drop duplicated rows. 

This code snippet utilizes these tow functions.

Outputs:

+----+------+
|ACCT|AMT   |
+----+------+
|101 |10.01 |
|101 |10.01 |
|101 |102.01|
+----+------+

+----+----------+------+
|ACCT|TXN_DT    |AMT   |
+----+----------+------+
|101 |2021-01-01|102.01|
|101 |2021-01-01|10.01 |
+----+----------+------+

+----+----------+------+
|ACCT|TXN_DT    |AMT   |
+----+----------+------+
|101 |2021-01-01|102.01|
|101 |2021-01-01|10.01 |
+----+----------+------+
fork_rightFork
more_vert
copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts