Scala: Read JSON file as Spark DataFrame

access_time 29 days ago visibility12 comment 0

In article Scala: Parse JSON String as Spark DataFrame, it shows how to convert an in-memory JSON string object to a Spark DataFrame. This article shows how to read directly from a JSON file. In fact, this is even simpler. 

Read from local JSON file

The following code snippet reads from a local JSON file named test.json.

The content of the JSON file is:

[{"ID":1,"ATTR1":"ABC"},
{"ID":2,"ATTR1":"DEF"},
{"ID":3,"ATTR1":"GHI"}]

Code snippet

scala> spark.read.format("json").option("multiLine","true").load("file:///F:\\big-data/test.json").show()
+-----+---+
|ATTR1| ID|
+-----+---+
|  ABC|  1|
|  DEF|  2|
|  GHI|  3|
+-----+---+

Read from HDFS JSON file

The following code snippet reads from a path in HDFS (/big-data/test.json):

scala> spark.read.format("json").option("multiLine","true").load("/big-data/test.json").show()
info Last modified by Raymond 28 days ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 23328
thumb_up 0
access_time 3 years ago

In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Go the following project site to understand more about parquet. https://parquet.apache.org/ If you have not installed Spark, follow this page to setup: Install Big Data ...

visibility 3558
thumb_up 0
access_time 3 years ago

This page shows how to import data from SQL Server into Hadoop via Apache Sqoop. Please follow the link below to install Sqoop in your machine if you don’t have one environment ready. Install Apache Sqoop in Windows Use the following command in Command Prompt, you will be able to find out ...

visibility 11274
thumb_up 1
access_time 2 years ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...