Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object.
In this code example, JSON file named 'example.json' has the following content:
[
{
"Category": "Category A",
"Count": 100,
"Description": "This is category A"
},
{
"Category": "Category B",
"Count": 120,
"Description": "This is category B"
},
{
"Category": "Category C",
"Count": 150,
"Description": "This is category C"
}
]
In the code snippet, the following option is important to let Spark to handle multiple line JSON content:
option("multiLine", true)
Code snippet
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ val appName = "Scala Example - JSON file to Spark Data Frame" val master = "local" /*Create Spark session with Hive supported.*/ val spark = SparkSession.builder.appName(appName).master(master).getOrCreate() val schema = StructType(Seq( StructField("Category", StringType, true), StructField("Count", IntegerType, true), StructField("Description", StringType, true) )) val json_file_path = "data/example.json" val df = spark.read.option("multiLine", true).schema(schema).json(json_file_path) print(df.schema) df.show()