Code python

Use sort() and orderBy() with PySpark DataFrame

Kontext Kontext visibility 210 comment 0 access_time 2 years ago language English

descriptionCode description

In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort and orderByorderBy is just the alias for sort API.

Syntax

DataFrame.sort(*cols, **kwargs)

For *cols, we can used it to specify a column name, a Column object (pyspark.sql.Column), or a list of column names or Column objects.

For **kwargs, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending. By default the value is True. It can be a list of boolean values for each columns that are used to sort the records.

The code snippet provides the examples of sorting a DataFrame.

Sample outputs

+---+----+
| id|col1|
+---+----+
|  2|   E|
|  4|   E|
|  6|   E|
|  8|   E|
|  1|   O|
|  3|   O|
|  5|   O|
|  7|   O|
|  9|   O|
+---+----+

+---+----+
| id|col1|
+---+----+
|  2|   E|
|  4|   E|
|  6|   E|
|  8|   E|
|  1|   O|
|  3|   O|
|  5|   O|
|  7|   O|
|  9|   O|
+---+----+

+---+----+
| id|col1|
+---+----+
|  8|   E|
|  6|   E|
|  4|   E|
|  2|   E|
|  9|   O|
|  7|   O|
|  5|   O|
|  3|   O|
|  1|   O|
+---+----+
fork_rightFork
more_vert
copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts