Write and read parquet files in Scala / Spark

visibility 1,118 access_time 4 years ago languageEnglish timeline Stats
timeline Stats
Page index 1.02

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

You can easily use Spark to read or write Parquet files. 

Code snippet

import org.apache.spark.sql.SparkSession

val appName = "Scala Parquet Example"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()
val df = spark.read.format("csv").option("header", "true").load("Sales.csv")
/*Write parquet file*/
df.write.parquet("Sales.parquet")
val df2 = spark.read.parquet("Sales.parquet")
df2.show()
info Last modified by Raymond 4 years ago copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Save Spark DataFrame to Teradata and Resolve Common Errors
visibility 667
thumb_up 0
access_time 2 years ago
Change DataFrame Column Names in PySpark
visibility 13,011
thumb_up 0
access_time 2 years ago
Fix - ERROR SparkUI: Failed to bind SparkUI
visibility 3,431
thumb_up 0
access_time 2 years ago
Fix - ERROR SparkUI: Failed to bind SparkUI