Read JSON file as Spark DataFrame in Scala / Spark

access_time 2 years ago visibility924 comment 0

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[

  {

    "Category": "Category A",

    "Count": 100,

    "Description": "This is category A"

  },

  {

    "Category": "Category B",

    "Count": 120,

    "Description": "This is category B"

  },

  {

    "Category": "Category C",

    "Count": 150,

    "Description": "This is category C"

  }

]

In the code snippet, the following option is important to let Spark to handle multiple line JSON content:

option("multiLine", true)

Code snippet

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

val appName = "Scala Example - JSON file to Spark Data Frame"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

val schema = StructType(Seq(
  StructField("Category", StringType, true),
StructField("Count", IntegerType, true),
StructField("Description", StringType, true)
))

val json_file_path = "data/example.json"
val df = spark.read.option("multiLine", true).schema(schema).json(json_file_path)
print(df.schema)
df.show()
info Last modified by Raymond at 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer pyspark local_offer spark local_offer spark-2-x local_offer spark-file-operations

visibility 11043
thumb_up 0
access_time 10 months ago

Spark provides rich APIs to save data frames to many different formats of files such as CSV, Parquet, Orc, Avro, etc. CSV is commonly used in data application though nowadays binary formats are getting momentum. In this article, I am going to show you how to save Spark data frame as CSV file in ...

local_offer scala local_offer spark-2-x

visibility 4945
thumb_up 1
access_time 11 months ago

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

local_offer pyspark local_offer spark-2-x local_offer spark

visibility 426
thumb_up 0
access_time 10 months ago

Sometime it is necessary to pass environment variables to Spark executors. To pass environment variable to executors, use setExecutorEnv function of SparkConf class. In the following code snippet, an environment variable name ENV_NAME is set up with value as 'ENV_Value'. from pyspark import ...

About column

Code snippets for various programming languages/frameworks.

rss_feed Subscribe RSS