Write and read parquet files in Scala / Spark

access_time 2 years ago visibility495 comment 0

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

You can easily use Spark to read or write Parquet files. 

Code snippet

import org.apache.spark.sql.SparkSession

val appName = "Scala Parquet Example"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()
val df = spark.read.format("csv").option("header", "true").load("Sales.csv")
/*Write parquet file*/
df.write.parquet("Sales.parquet")
val df2 = spark.read.parquet("Sales.parquet")
df2.show()
info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 8331
thumb_up 1
access_time 2 years ago

The page summarizes the steps required to run and debug PySpark (Spark for Python) in Visual Studio Code. Install Python from the official website: https://www.python.org/downloads/ . The version I am using is 3.6.4 32-bit. Pip is shipped together in this version. Download Spark 2.3.3 from ...

visibility 11656
thumb_up 1
access_time 2 years ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...

visibility 3721
thumb_up 0
access_time 2 years ago

This page summarizes the steps to install Zeppelin version 0.7.3 on Windows 10 via Windows Subsystem for Linux (WSL). When running Zeppelin in Ubuntu, the server may pick up one host address that is not accessible, for example 169.254.148.100, and the the remote interpreter connection cannot be ...