Write and read parquet files in Python / Spark

visibility 10,720 access_time 2 years ago languageEnglish timeline Stats
timeline Stats
Page index 9.50
more_horiz

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

You can also use PySpark to read or write parquet files.

Code snippet

from pyspark.sql import SparkSession

appName = "Scala Parquet Example"
master = "local"

spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

df = spark.read.format("csv").option("header", "true").load("Sales.csv")

df.write.parquet("Sales.parquet")

df2 = spark.read.parquet("Sales.parquet")
df2.show()
info Last modified by Administrator 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Spark - Check if Array Column Contains Specific Value
visibility 2,731
thumb_up 0
access_time 2 years ago
[Diagram] Python Libraries to Connect to SQL Server image
visibility 60
thumb_up 0
access_time 7 months ago
Python Libraries to Connect to SQL Server
PySpark - Read and Write Avro Files
visibility 29
thumb_up 0
access_time 11 days ago