Spark Partition Discovery

2021-12-22 spark

Spark supports partition discovery. All built in file sources (Text/CSV/JSON/ORC/Parquet) supports partition discovery and partition information inference.

This data shows a example data set that is stored by two partition levels: month and country.

The following code snippet will read all the underlying parquet files:

df = spark.read.option("basePath","/data").parquet("/data")
/data
[Not supported by viewer]
/month=2021-01-01
[Not supported by viewer]
/month=****-**-**
[Not supported by viewer]
/month=2021-12-01
[Not supported by viewer]
/country=AU
[Not supported by viewer]
/data1.parquet
[Not supported by viewer]
/data2.parquet
[Not supported by viewer]
/data3.parquet
[Not supported by viewer]
/country=CN
[Not supported by viewer]
/data1.parquet
[Not supported by viewer]
/data2.parquet
[Not supported by viewer]
/data3.parquet
[Not supported by viewer]