PySpark DataFrame - explode Array and Map Columns
In PySpark, we can use explode
function to explode an array or a map column. After exploding, the DataFrame will end up with more rows.
Code snippet
The following code snippet explode an array column.
from pyspark.sql import SparkSession import pyspark.sql.functions as F appName = "PySpark DataFrame - explode function" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel('WARN') data = [{"values": [1, 2, 3, 4, 5]}, {"values": [6, 7, 8]}] df = spark.createDataFrame(data) df.show() df.withColumn('value', F.explode(df['values'])).show()
Each value of the array becomes a column in a row:
+---------------+-----+ | values|value| +---------------+-----+ |[1, 2, 3, 4, 5]| 1| |[1, 2, 3, 4, 5]| 2| |[1, 2, 3, 4, 5]| 3| |[1, 2, 3, 4, 5]| 4| |[1, 2, 3, 4, 5]| 5| | [6, 7, 8]| 6| | [6, 7, 8]| 7| | [6, 7, 8]| 8| +---------------+-----+
For map column, we can also use explode
function.
from pyspark.sql import SparkSession import pyspark.sql.functions as F appName = "PySpark DataFrame - explode function" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel('WARN') data = [{"values": {"a": "100", "b": "200"}}, {"values": {"a": "1000", "b": "2000"}}] df = spark.createDataFrame(data) df.show() df = df.select("*", F.explode(df['values']).alias('key', 'value')) df.show()
The output includes one row for each attribute in each map object as the following shows:
+--------------------+ | values| +--------------------+ |[a -> 100, b -> 200]| |[a -> 1000, b -> ...| +--------------------+ +--------------------+---+-----+ | values|key|value| +--------------------+---+-----+ |[a -> 100, b -> 200]| a| 100| |[a -> 100, b -> 200]| b| 200| |[a -> 1000, b -> ...| a| 1000| |[a -> 1000, b -> ...| b| 2000| +--------------------+---+-----+
copyright
This page is subject to Site terms.
comment Comments
No comments yet.