Read JSON file as Spark DataFrame in Scala / Spark

visibility 3,004 access_time 3 years ago languageEnglish timeline Stats
timeline Stats
Page index 2.85

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[

  {

    "Category": "Category A",

    "Count": 100,

    "Description": "This is category A"

  },

  {

    "Category": "Category B",

    "Count": 120,

    "Description": "This is category B"

  },

  {

    "Category": "Category C",

    "Count": 150,

    "Description": "This is category C"

  }

]

In the code snippet, the following option is important to let Spark to handle multiple line JSON content:

option("multiLine", true)

Code snippet

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

val appName = "Scala Example - JSON file to Spark Data Frame"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

val schema = StructType(Seq(
  StructField("Category", StringType, true),
StructField("Count", IntegerType, true),
StructField("Description", StringType, true)
))

val json_file_path = "data/example.json"
val df = spark.read.option("multiLine", true).schema(schema).json(json_file_path)
print(df.schema)
df.show()
info Last modified by Raymond 3 years ago copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Pass Environment Variables to Executors in PySpark
visibility 3,406
thumb_up 1
access_time 3 years ago
Scala: Read CSV File as Spark DataFrame
visibility 9,349
thumb_up 0
access_time 2 years ago
Scala: Convert List to Spark Data Frame
visibility 3,023
thumb_up 0
access_time 2 years ago