Write and read parquet files in Python / Spark
Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files.
You can also use PySpark to read or write parquet files.
Code snippet
from pyspark.sql import SparkSession appName = "Scala Parquet Example" master = "local" spark = SparkSession.builder.appName(appName).master(master).getOrCreate() df = spark.read.format("csv").option("header", "true").load("Sales.csv") df.write.parquet("Sales.parquet") df2 = spark.read.parquet("Sales.parquet") df2.show()
info Last modified by Administrator 5 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.