Use sort() and orderBy() with PySpark DataFrame
Code description
In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort
and orderBy
. orderBy
is just the alias for sort
API.
Syntax
DataFrame.sort(*cols, **kwargs)
For *cols
, we can used it to specify a column name, a Column object (pyspark.sql.Column
), or a list of column names or Column objects.
For **kwargs
, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending
. By default the value is True
. It can be a list of boolean values for each columns that are used to sort the records.
The code snippet provides the examples of sorting a DataFrame.
Sample outputs
+---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 8| E| | 6| E| | 4| E| | 2| E| | 9| O| | 7| O| | 5| O| | 3| O| | 1| O| +---+----+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.functions import * app_name = "PySpark sort and orderBy Example" master = "local" # Create Spark session with Delta extension builder = SparkSession.builder.appName(app_name) \ .master(master) spark = builder.getOrCreate() df = spark.range(1,10) df = df.withColumn('col1', expr("case when id%2==0 then 'E' else 'O' end")) # Sort df.sort('col1').show() df.sort(['col1','id'], ascending=True).show() df.orderBy(['col1','id'], ascending=[True,False]).show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.