By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[ { "Category": "Category A", "Count": 100, "Description": "This is category A" }, { "Category": "Category B", "Count": 120, "Description": "This is category B" }, { "Category": "Category C", "Count": 150, "Description": "This is category C" } ]



The file is loaded as a Spark DataFrame using SparkSession.read.json function.

multiLine=True argument is important as the JSON file content is across multiple lines. 

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType

appName = "PySpark Example - JSON file to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', IntegerType(), True),
    StructField('Description', StringType(), True)
])

# Create data frame
json_file_path = 'data/example.json'
df = spark.read.json(json_file_path, schema, multiLine=True)
print(df.schema)
df.show()
info Last modified by Raymond at 9 months ago * This page is subject to Site terms.

More from Kontext

PySpark Read Multiple Lines Records from CSV

local_offer pyspark local_offer spark-2-x local_offer python

visibility 6
thumb_up 0
access_time 1 day ago

CSV is a common format used when extracting and exchanging data between systems and platforms. Once CSV file is ingested into HDFS, you can easily read them as DataFrame in Spark. However there are a few options you need to pay attention to especially if you source file: Has records ac...

open_in_new View

local_offer pyspark local_offer spark-2-x local_offer teradata local_offer SQL Server

visibility 45
thumb_up 0
access_time 12 days ago

In my previous article about  Connect to SQL Server in Spark (PySpark) , I mentioned the ways t...

open_in_new View

local_offer python local_offer SQL Server

visibility 132
thumb_up 0
access_time 2 months ago

Python JayDeBeApi module allows you to connect from Python to databases using Java JDBC drivers.

open_in_new View

Spark Read from SQL Server Source using Windows/Kerberos Authentication

local_offer pyspark local_offer SQL Server local_offer spark-2-x

visibility 134
thumb_up 0
access_time 2 months ago

In this article, I am going to show you how to use JDBC Kerberos authentication to connect to SQL Server sources in Spark (PySpark). I will use  Kerberos connection with principal names and password directly that requires  ...

open_in_new View

info About author

Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us