Spark - Check if Array Column Contains Specific Value

Raymond Raymond visibility 4,630 event 2021-05-22 access_time 3 years ago language English

Spark DataFrames supports complex data types like array. This code snippet provides one example to check whether specific value exists in an array column using array_contains function.

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, IntegerType, StringType, StructField, StructType
from pyspark.sql.functions import array_contains

appName = "PySpark Example - array_contains"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \

# Sample data
data = [(1, ['apple', 'pear', 'kiwi']), (2, ['apple']), (3, ['pear', 'berry'])]

# schema
schema = StructType([StructField("ID", IntegerType(), True),
                     StructField("Tags", ArrayType(StringType()), True)])

# Create Spark DaraFrame from pandas DataFrame
df = spark.createDataFrame(data, schema)

# Show records contain apple in Tags column only
df.where(array_contains('Tags', 'apple')).show()

# Show records don't contain apple in Tags column only
df.where(array_contains('Tags', 'apple') == False).show()

The code snippet constructs a Spark DataFrame using data in memory. The schema looks like the following:

The output:

| ID|               Tags|
|  1|[apple, pear, kiwi]|
|  2|            [apple]|
|  3|      [pear, berry]|

| ID|               Tags|
|  1|[apple, pear, kiwi]|
|  2|            [apple]|

| ID|         Tags|
|  3|[pear, berry]|

The second result prints out the records with word 'apple' in Tags array column; the third one prints out the ones without.


Spark SQL - Array Functions - Kontext

More from Kontext
copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts