Write and read parquet files in Python / Spark

Raymond Tang Raymond Tang 0 12216 5.49 index 5/28/2019

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files.

You can also use PySpark to read or write parquet files.

Code snippet

from pyspark.sql import SparkSession

appName = "Scala Parquet Example"
master = "local"

spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

df = spark.read.format("csv").option("header", "true").load("Sales.csv")

df.write.parquet("Sales.parquet")

df2 = spark.read.parquet("Sales.parquet")
df2.show()
python spark spark-file-operations

Join the Discussion

View or add your thoughts below

Comments