Spark SQL - PERCENT_RANK Window Function

visibility 335 access_time 9 months ago languageEnglish timeline Stats
timeline Stats
Page index 1.31
more_horiz

About PERCENT_RANK function

PERCENT_RANK in Spark returns the percentile of rows within a window partition.  

PERCENT_RANK without partition

The following sample SQL uses PERCENT_RANK function without PARTITION BY clause:

SELECT StudentScore.*, PERCENT_RANK() OVER (ORDER BY Score) AS Percentile FROM VALUES 
(101,56),
(102,78),
(103,70),
(104,93),
(105,95),
(106,95),
(107,75),
(108,81),
(109,66),
(110,73)
AS StudentScore(Student,Score)

Result:

+-------+-----+------------------+
|Student|Score| Percentile|
+-------+-----+------------------+
| 101| 56| 0.0|
| 109| 66|0.1111111111111111|
| 103| 70|0.2222222222222222|
| 110| 73|0.3333333333333333|
| 107| 75|0.4444444444444444|
| 102| 78|0.5555555555555556|
| 108| 81|0.6666666666666666|
| 104| 93|0.7777777777777778|
| 105| 95|0.8888888888888888|
| 106| 95|0.8888888888888888|
+-------+-----+------------------+
warning The following warning message will show: WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. 

PERCENT_RANK with partition

The following sample SQL returns a rank number for each records in each window (defined by PARTITION BY):

SELECT StudentScore.*, PERCENT_RANK() OVER (PARTITION BY Class ORDER BY Score) AS Percentile FROM VALUES 
(1,101,56),
(1,102,78),
(1,103,70),
(1,104,93),
(1,105,95),
(2,106,95),
(2,107,75),
(2,108,81),
(2,109,66),
(2,110,73)
AS StudentScore(Class,Student,Score)

Result:

+-----+-------+-----+----------+
|Class|Student|Score|Percentile|
+-----+-------+-----+----------+
| 1| 101| 56| 0.0|
| 1| 103| 70| 0.25|
| 1| 102| 78| 0.5|
| 1| 104| 93| 0.75|
| 1| 105| 95| 1.0|
| 2| 109| 66| 0.0|
| 2| 110| 73| 0.25|
| 2| 107| 75| 0.5|
| 2| 108| 81| 0.75|
| 2| 106| 95| 1.0|
+-----+-------+-----+----------+

Records are allocated to windows based on Class column and the rank is computed based on column Score.  

infoBy default, records will be sorted in ascending order. Use ORDER BY .. DESC to sort records in descending order.

Example table

The virtual table/data frame is cited from SQL - Construct Table using Literals.

copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Spark SQL - FIRST_VALUE or LAST_VALUE
visibility 3,323
thumb_up 2
access_time 2 years ago
Show Headings (Column Names) in spark-sql CLI Result
visibility 527
thumb_up 0
access_time 2 years ago
Spark SQL - Return JSON Object Keys (json_object_keys)
visibility 27
thumb_up 0
access_time 24 days ago