Read JSON file as Spark DataFrame in Python / Spark
Spark has easy fluent APIs that can be used to read data from JSON file as DataFrame object.
In this code example, JSON file named 'example.json' has the following content:
[
{
"Category": "Category A",
"Count": 100,
"Description": "This is category A"
},
{
"Category": "Category B",
"Count": 120,
"Description": "This is category B"
},
{
"Category": "Category C",
"Count": 150,
"Description": "This is category C"
}
]
The file is loaded as a Spark DataFrame using SparkSession.read.json function.
multiLine=True argument is important as the JSON file content is across multiple lines.
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType appName = "PySpark Example - JSON file to Spark Data Frame" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() # Create a schema for the dataframe schema = StructType([ StructField('Category', StringType(), True), StructField('Count', IntegerType(), True), StructField('Description', StringType(), True) ]) # Create data frame json_file_path = 'data/example.json' df = spark.read.json(json_file_path, schema, multiLine=True) print(df.schema) df.show()
info Last modified by Raymond 5 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.