Spark - Check if Array Column Contains Specific Value

Raymond Raymond event 2021-05-22 visibility 5,202
more_vert

Spark DataFrames supports complex data types like array. This code snippet provides one example to check whether specific value exists in an array column using array_contains function.

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, IntegerType, StringType, StructField, StructType
from pyspark.sql.functions import array_contains

appName = "PySpark Example - array_contains"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# Sample data
data = [(1, ['apple', 'pear', 'kiwi']), (2, ['apple']), (3, ['pear', 'berry'])]

# schema
schema = StructType([StructField("ID", IntegerType(), True),
                     StructField("Tags", ArrayType(StringType()), True)])

# Create Spark DaraFrame from pandas DataFrame
df = spark.createDataFrame(data, schema)
print(df.schema)
df.show()

# Show records contain apple in Tags column only
df.where(array_contains('Tags', 'apple')).show()

# Show records don't contain apple in Tags column only
df.where(array_contains('Tags', 'apple') == False).show()

spark.stop()
The code snippet constructs a Spark DataFrame using data in memory. The schema looks like the following:
StructType(List(StructField(ID,IntegerType,true),StructField(Tags,ArrayType(StringType,true),true)))

The output:

+---+-------------------+
| ID|               Tags|
+---+-------------------+
|  1|[apple, pear, kiwi]|
|  2|            [apple]|
|  3|      [pear, berry]|
+---+-------------------+

+---+-------------------+
| ID|               Tags|
+---+-------------------+
|  1|[apple, pear, kiwi]|
|  2|            [apple]|
+---+-------------------+

+---+-------------+
| ID|         Tags|
+---+-------------+
|  3|[pear, berry]|
+---+-------------+

The second result prints out the records with word 'apple' in Tags array column; the third one prints out the ones without.

References

Spark SQL - Array Functions - Kontext

More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts