Write and read parquet files in Python / Spark

access_time 2 years ago visibility3438 comment 0

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

You can also use PySpark to read or write parquet files.

Code snippet

from pyspark.sql import SparkSession

appName = "Scala Parquet Example"
master = "local"

spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

df = spark.read.format("csv").option("header", "true").load("Sales.csv")

df.write.parquet("Sales.parquet")

df2 = spark.read.parquet("Sales.parquet")
df2.show()
info Last modified by Administrator at 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer python

visibility 33
thumb_up 1
access_time 11 months ago

Different programming languages have different package management tools.

local_offer python local_offer pandas local_offer python-file-operations

visibility 237
thumb_up 0
access_time 9 months ago

Pickle files are commonly used Python data related projects. This article shows how to create and load pickle files using Pandas.  import pandas as pd import numpy as np file_name="data/test.pkl" data = np.random.randn(1000, 2) # pd.set_option('display.max_rows', None) df = ...

Pandas DataFrame Plot - Area Chart

local_offer plot local_offer jupyter-notebook local_offer python local_offer pandas local_offer pandas-plot

visibility 139
thumb_up 0
access_time 6 months ago

This article provides examples about plotting area chart using  pandas.DataFrame.plot  or  pandas.core.groupby.DataFrameGroupBy.plot   function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter ...

About column

Code snippets for various programming languages/frameworks.

rss_feed Subscribe RSS