Spark (v2.x) Scala

Read JSON file as Spark DataFrame in Scala / Spark (v2.x)

about 4 months ago

Code description

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

In this code example,  JSON file named 'example.json' has the following content:

[

  {

    "Category": "Category A",

    "Count": 100,

    "Description": "This is category A"

  },

  {

    "Category": "Category B",

    "Count": 120,

    "Description": "This is category B"

  },

  {

    "Category": "Category C",

    "Count": 150,

    "Description": "This is category C"

  }

]

In the code snippet, the following option is important to let Spark to handle multiple line JSON content:

option("multiLine", true)

Code snippet

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

val appName = "Scala Example - JSON file to Spark Data Frame"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

val schema = StructType(Seq(
  StructField("Category", StringType, true),
StructField("Count", IntegerType, true),
StructField("Description", StringType, true)
))

val json_file_path = "data/example.json"
val df = spark.read.option("multiLine", true).schema(schema).json(json_file_path)
print(df.schema)
df.show()

Other versions

Spark (v2.x) Python

Read JSON file as Spark DataFrame in Python / Spark (v2.x)

Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object. 

View detail