Kontext
Use sort() and orderBy() with PySpark DataFrame
insights Stats
warning Please login first to view stats information.
Code description
In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort
and orderBy
. orderBy
is just the alias for sort
API.
Syntax
DataFrame.sort(*cols, **kwargs)
For *cols
, we can used it to specify a column name, a Column object (pyspark.sql.Column
), or a list of column names or Column objects.
For **kwargs
, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending
. By default the value is True
. It can be a list of boolean values for each columns that are used to sort the records.
The code snippet provides the examples of sorting a DataFrame.
Sample outputs
+---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 8| E| | 6| E| | 4| E| | 2| E| | 9| O| | 7| O| | 5| O| | 3| O| | 1| O| +---+----+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.functions import * app_name = "PySpark sort and orderBy Example" master = "local" # Create Spark session with Delta extension builder = SparkSession.builder.appName(app_name) \ .master(master) spark = builder.getOrCreate() df = spark.range(1,10) df = df.withColumn('col1', expr("case when id%2==0 then 'E' else 'O' end")) # Sort df.sort('col1').show() df.sort(['col1','id'], ascending=True).show() df.orderBy(['col1','id'], ascending=[True,False]).show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.