Code description
In Spark DataFrame, two APIs are provided to sort the rows in a DataFrame based on the provided column or columns: sort
and orderBy
. orderBy
is just the alias for sort
API.
Syntax
DataFrame.sort(*cols, **kwargs)
For *cols
, we can used it to specify a column name, a Column object (pyspark.sql.Column
), or a list of column names or Column objects.
For **kwargs
, we can use it to specify additional arguments. For PySpark, we can specify a parameter named ascending
. By default the value is True
. It can be a list of boolean values for each columns that are used to sort the records.
The code snippet provides the examples of sorting a DataFrame.
Sample outputs
+---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 2| E| | 4| E| | 6| E| | 8| E| | 1| O| | 3| O| | 5| O| | 7| O| | 9| O| +---+----+ +---+----+ | id|col1| +---+----+ | 8| E| | 6| E| | 4| E| | 2| E| | 9| O| | 7| O| | 5| O| | 3| O| | 1| O| +---+----+