Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
PySpark DataFrame - Convert JSON Column to Row using json_tuple
insights Stats
warning Please login first to view stats information.
Kontext
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Code description
PySpark SQL functions json_tuple
can be used to convert DataFrame JSON string columns to tuples (new rows in the DataFrame).
Syntax of this function looks like the following:
pyspark.sql.functions.json_tuple(col, *fields)
The first parameter is the JSON string column name in the DataFrame and the second is the filed name list to extract.
If you need to extract complex JSON documents like JSON arrays, you can follow this article - PySpark: Convert JSON String Column to Array of Object (StructType) in DataFrame.
Output
StructType([StructField('id', LongType(), True), StructField('c0', StringType(), True), StructField('c1', StringType(), True), StructField('c2', StringType(), True)]) +---+---+------+----------+ | id| c0| c1| c2| +---+---+------+----------+ | 1| 1|10.201|2021-01-01| | 2| 2|20.201|2022-01-01| +---+---+------+----------+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.functions import json_tuple app_name = "PySpark json_tuple sql functions" master = "local" spark = SparkSession.builder \ .appName(app_name) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("WARN") # Create a DataFrame df = spark.createDataFrame( [[1, '{"Attr_INT":1, "ATTR_DOUBLE":10.201, "ATTR_DATE": "2021-01-01"}'], [2, '{"Attr_INT":2, "ATTR_DOUBLE":20.201, "ATTR_DATE": "2022-01-01"}']], ['id', 'json_col']) # Extract JSON values df = df.select(df.id, json_tuple( df.json_col, 'Attr_INT', 'ATTR_DOUBLE', 'ATTR_DATE')) print(df.schema) df.show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.