This code snippet provides an example of calculating aggregated values after grouping data in PySpark DataFrame. To group data, DataFrame.groupby
or DataFrame.groupBy
can be used; then GroupedData.agg
method can be used to aggregate data for each group. Built-in aggregation functions like sum
, avg
, max
, min
and others can be used. Customized aggregation functions can also be used.
Output:
+----------+--------+ |TotalScore|AvgScore| +----------+--------+ | 392| 78.4| +----------+--------+