Spark SQL - PERCENT_RANK Window Function
About PERCENT_RANK function
PERCENT_RANK in Spark returns the percentile of rows within a window partition.
PERCENT_RANK without partition
The following sample SQL uses PERCENT_RANK function without PARTITION BY clause:
SELECT StudentScore.*, PERCENT_RANK() OVER (ORDER BY Score) AS Percentile FROM VALUES
(101,56),
(102,78),
(103,70),
(104,93),
(105,95),
(106,95),
(107,75),
(108,81),
(109,66),
(110,73)
AS StudentScore(Student,Score)
Result:
+-------+-----+------------------+
|Student|Score| Percentile|
+-------+-----+------------------+
| 101| 56| 0.0|
| 109| 66|0.1111111111111111|
| 103| 70|0.2222222222222222|
| 110| 73|0.3333333333333333|
| 107| 75|0.4444444444444444|
| 102| 78|0.5555555555555556|
| 108| 81|0.6666666666666666|
| 104| 93|0.7777777777777778|
| 105| 95|0.8888888888888888|
| 106| 95|0.8888888888888888|
+-------+-----+------------------+
warning The following warning message will show: WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
PERCENT_RANK with partition
The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):
SELECT StudentScore.*, PERCENT_RANK() OVER (PARTITION BY Class ORDER BY Score) AS Percentile FROM VALUES
(1,101,56),
(1,102,78),
(1,103,70),
(1,104,93),
(1,105,95),
(2,106,95),
(2,107,75),
(2,108,81),
(2,109,66),
(2,110,73)
AS StudentScore(Class,Student,Score)
Result:
+-----+-------+-----+----------+
|Class|Student|Score|Percentile|
+-----+-------+-----+----------+
| 1| 101| 56| 0.0|
| 1| 103| 70| 0.25|
| 1| 102| 78| 0.5|
| 1| 104| 93| 0.75|
| 1| 105| 95| 1.0|
| 2| 109| 66| 0.0|
| 2| 110| 73| 0.25|
| 2| 107| 75| 0.5|
| 2| 108| 81| 0.75|
| 2| 106| 95| 1.0|
+-----+-------+-----+----------+
Records are allocated to windows based on Class column and the rank is computed based on column Score.
infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.
Example table
The virtual table/data frame is cited from SQL - Construct Table using Literals.
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
warning Please login first to view stats information.
article
Spark SQL - Return JSON Object Keys (json_object_keys)
article
Spark SQL - Left and Right Padding (lpad and rpad) Functions
article
Spark SQL - Convert Delimited String to Map using str_to_map Function
article
Spark SQL - Extract Day, Month, Year and other Part from Date or Timestamp
article
Spark SQL - Extract Value from JSON String
Read more (46)