Spark Partitioning Physical Operators

2022-03-29 pysparkspark

This diagram shows how Spark decides which repartition physical operators will be used for each scenario.

repartition(numPartitions, *cols)
repartition(numPartitions, *cols)
[Not supported by viewer]
No
[Not supported by viewer]
Yes
[Not supported by viewer]
numPartitions specified?
[Not supported by viewer]
HashPartitioning
[Not supported by viewer]
Yes
[Not supported by viewer]
No
[Not supported by viewer]
Partition columns specified?
[Not supported by viewer]
HashPartitioning
[Not supported by viewer]
Partition columns specified?
[Not supported by viewer]
Yes
[Not supported by viewer]
No
[Not supported by viewer]
numPartitions > 1
[Not supported by viewer]
RoundRobinPartitioning
[Not supported by viewer]
SinglePartition
[Not supported by viewer]