Raymond Raymond

Write and read parquet files in Scala / Spark

event 2019-11-18 visibility 2,053 comment 0 insights toc
more_vert
insights Stats
toc Table of contents

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

You can easily use Spark to read or write Parquet files. 

Code snippet

import org.apache.spark.sql.SparkSession

val appName = "Scala Parquet Example"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()
val df = spark.read.format("csv").option("header", "true").load("Sales.csv")
/*Write parquet file*/
df.write.parquet("Sales.parquet")
val df2 = spark.read.parquet("Sales.parquet")
df2.show()
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts