Spark - Check if Array Column Contains Specific Value

visibility 2,373 access_time 13 months ago languageEnglish

Spark DataFrames supports complex data types like array. This code snippet provides one example to check whether specific value exists in an array column using array_contains function.

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, IntegerType, StringType, StructField, StructType
from pyspark.sql.functions import array_contains

appName = "PySpark Example - array_contains"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# Sample data
data = [(1, ['apple', 'pear', 'kiwi']), (2, ['apple']), (3, ['pear', 'berry'])]

# schema
schema = StructType([StructField("ID", IntegerType(), True),
                     StructField("Tags", ArrayType(StringType()), True)])

# Create Spark DaraFrame from pandas DataFrame
df = spark.createDataFrame(data, schema)
print(df.schema)
df.show()

# Show records contain apple in Tags column only
df.where(array_contains('Tags', 'apple')).show()

# Show records don't contain apple in Tags column only
df.where(array_contains('Tags', 'apple') == False).show()

spark.stop()
The code snippet constructs a Spark DataFrame using data in memory. The schema looks like the following:
StructType(List(StructField(ID,IntegerType,true),StructField(Tags,ArrayType(StringType,true),true)))

The output:

+---+-------------------+
| ID|               Tags|
+---+-------------------+
|  1|[apple, pear, kiwi]|
|  2|            [apple]|
|  3|      [pear, berry]|
+---+-------------------+

+---+-------------------+
| ID|               Tags|
+---+-------------------+
|  1|[apple, pear, kiwi]|
|  2|            [apple]|
+---+-------------------+

+---+-------------+
| ID|         Tags|
+---+-------------+
|  3|[pear, berry]|
+---+-------------+

The second result prints out the records with word 'apple' in Tags array column; the third one prints out the ones without.

References

Spark SQL - Array Functions - Kontext

copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

timeline Stats
Page index 6.52
More from Kontext
Scala: Read JSON file as Spark DataFrame
visibility 298
thumb_up 0
access_time 2 years ago
Turn off INFO logs in Spark
visibility 11,384
thumb_up 0
access_time 2 years ago