access_time 2 years ago languageEnglish
more_vert

Read JSON file as Spark DataFrame in Scala / Spark

visibility 2,206 comment 0

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[

  {

    "Category": "Category A",

    "Count": 100,

    "Description": "This is category A"

  },

  {

    "Category": "Category B",

    "Count": 120,

    "Description": "This is category B"

  },

  {

    "Category": "Category C",

    "Count": 150,

    "Description": "This is category C"

  }

]

In the code snippet, the following option is important to let Spark to handle multiple line JSON content:

option("multiLine", true)

Code snippet

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

val appName = "Scala Example - JSON file to Spark Data Frame"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

val schema = StructType(Seq(
  StructField("Category", StringType, true),
StructField("Count", IntegerType, true),
StructField("Description", StringType, true)
))

val json_file_path = "data/example.json"
val df = spark.read.option("multiLine", true).schema(schema).json(json_file_path)
print(df.schema)
df.show()
info Last modified by Raymond 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn.

Want to contribute on Kontext to help others?

Learn more