PySpark DataFrame - percent_rank() Function
Code description
In Spark SQL, PERCENT_RANK(Spark SQL - PERCENT_RANK Window Function). This code snippet implements percentile ranking (relative ranking) directly using PySpark DataFrame percent_rank
API instead of Spark SQL.
Output:
+-------+-----+------------------+ |Student|Score| percent_rank| +-------+-----+------------------+ | 101| 56| 0.0| | 109| 66|0.1111111111111111| | 103| 70|0.2222222222222222| | 110| 73|0.3333333333333333| | 107| 75|0.4444444444444444| | 102| 78|0.5555555555555556| | 108| 81|0.6666666666666666| | 104| 93|0.7777777777777778| | 105| 95|0.8888888888888888| | 106| 95|0.8888888888888888| +-------+-----+------------------+
Code snippet
from pyspark.sql import SparkSession, Window from pyspark.sql.functions import percent_rank app_name = "PySpark percent_rank Window Function" master = "local" spark = SparkSession.builder \ .appName(app_name) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") data = [ [101, 56], [102, 78], [103, 70], [104, 93], [105, 95], [106, 95], [107, 75], [108, 81], [109, 66], [110, 73]] df = spark.createDataFrame(data, ['Student', 'Score']) window = Window.orderBy("Score").rowsBetween( Window.unboundedPreceding, Window.currentRow) df = df.withColumn('percent_rank', percent_rank().over(window)) df.show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.