🚀 News: We are launching the Kontext Labs AI-Native Data Intelligence Platform Pilot! Click here to join our pilot program.

Use sort() and orderBy() with PySpark DataFrame

Code description

In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort and orderBy. orderBy is just the alias for sort API.

Syntax

    DataFrame.sort(*cols, **kwargs)  
    

For *cols, we can used it to specify a column name, a Column object (pyspark.sql.Column), or a list of column names or Column objects.

For **kwargs, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending. By default the value is True. It can be a list of boolean values for each columns that are used to sort the records.

The code snippet provides the examples of sorting a DataFrame.

Sample outputs

    +---+----+
    | id|col1|
    +---+----+
    |  2|   E|
    |  4|   E|
    |  6|   E|
    |  8|   E|
    |  1|   O|
    |  3|   O|
    |  5|   O|
    |  7|   O|
    |  9|   O|
    +---+----+
    
    +---+----+
    | id|col1|
    +---+----+
    |  2|   E|
    |  4|   E|
    |  6|   E|
    |  8|   E|
    |  1|   O|
    |  3|   O|
    |  5|   O|
    |  7|   O|
    |  9|   O|
    +---+----+
    
    +---+----+
    | id|col1|
    +---+----+
    |  8|   E|
    |  6|   E|
    |  4|   E|
    |  2|   E|
    |  9|   O|
    |  7|   O|
    |  5|   O|
    |  3|   O|
    |  1|   O|
    +---+----+  
    

Code snippet

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import *
    
    app_name = "PySpark sort and orderBy Example"
    master = "local"
    
    # Create Spark session with Delta extension
    builder = SparkSession.builder.appName(app_name)         .master(master)
    spark = builder.getOrCreate()
    
    df = spark.range(1,10)
    df = df.withColumn('col1', expr("case when id%2==0 then 'E' else 'O' end"))
    
    # Sort
    df.sort('col1').show()
    df.sort(['col1','id'], ascending=True).show()
    df.orderBy(['col1','id'], ascending=[True,False]).show()