pyspark python

Use sort() and orderBy() with PySpark DataFrame

event 2022-09-03 visibility 407

more_vert

Code description

In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort and orderBy. orderBy is just the alias for sort API.

Syntax

DataFrame.sort(*cols, **kwargs)

For *cols, we can used it to specify a column name, a Column object (pyspark.sql.Column), or a list of column names or Column objects.

For **kwargs, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending. By default the value is True. It can be a list of boolean values for each columns that are used to sort the records.

The code snippet provides the examples of sorting a DataFrame.

Sample outputs

+---+----+
| id|col1|
+---+----+
|  2|   E|
|  4|   E|
|  6|   E|
|  8|   E|
|  1|   O|
|  3|   O|
|  5|   O|
|  7|   O|
|  9|   O|
+---+----+

+---+----+
| id|col1|
+---+----+
|  2|   E|
|  4|   E|
|  6|   E|
|  8|   E|
|  1|   O|
|  3|   O|
|  5|   O|
|  7|   O|
|  9|   O|
+---+----+

+---+----+
| id|col1|
+---+----+
|  8|   E|
|  6|   E|
|  4|   E|
|  2|   E|
|  9|   O|
|  7|   O|
|  5|   O|
|  3|   O|
|  1|   O|
+---+----+

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

app_name = "PySpark sort and orderBy Example"
master = "local"

# Create Spark session with Delta extension
builder = SparkSession.builder.appName(app_name) \
    .master(master)
spark = builder.getOrCreate()

df = spark.range(1,10)
df = df.withColumn('col1', expr("case when id%2==0 then 'E' else 'O' end"))

# Sort
df.sort('col1').show()
df.sort(['col1','id'], ascending=True).show()
df.orderBy(['col1','id'], ascending=[True,False]).show()

copyright This page is subject to Site terms.

Code Snippets & Tips

Log in with external accounts