Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[ { "Category": "Category A", "Count": 100, "Description": "This is category A" }, { "Category": "Category B", "Count": 120, "Description": "This is category B" }, { "Category": "Category C", "Count": 150, "Description": "This is category C" } ]



The file is loaded as a Spark DataFrame using SparkSession.read.json function.

multiLine=True argument is important as the JSON file content is across multiple lines. 

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType

appName = "PySpark Example - JSON file to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', IntegerType(), True),
    StructField('Description', StringType(), True)
])

# Create data frame
json_file_path = 'data/example.json'
df = spark.read.json(json_file_path, schema, multiLine=True)
print(df.schema)
df.show()
info Last modified by Raymond at 12 months ago * This page is subject to Site terms.

More from Kontext

local_offer teradata local_offer python

visibility 560
thumb_up 1
access_time 3 months ago

Pandas is commonly used by Python users to perform data operations. In many scenarios, the results need to be saved to a storage like Teradata. This article shows you how to do that easily using JayDeBeApi or  ...

open_in_new Spark + PySpark

local_offer python

visibility 151
thumb_up 0
access_time 2 months ago

CSV is a common data format used in many applications. It's also a common task for data workers to read and parse CSV and then save it into another storage such as RDBMS (Teradata, SQL Server, MySQL). In my previous article  ...

open_in_new Python Programming

local_offer teradata local_offer python local_offer Java

visibility 311
thumb_up 0
access_time 3 months ago

Python JayDeBeApi module allows you to connect from Python to Teradata databases using Java JDBC drivers. In article Connect to Teradata database through Python , I showed ho...

open_in_new Python Programming

local_offer sqlite local_offer python local_offer Java

visibility 79
thumb_up 0
access_time 3 months ago

To read data from SQLite database in Python, you can use the built-in sqlite3 package . Another approach is to use SQLite JDBC driver via  ...

open_in_new Python Programming

info About author

comment Comments (0)

comment Add comment

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

No comments yet.

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward