Spark SQL - Check if String Contains a String
Spark SQL functions contains
and instr
can be used to check if a string contains a string. It can also be used to filter data.
Use contains function
The syntax of this function is defined as:
contains(left, right) - This function returns a boolean. Retuns True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
Code snippet
The following code snippet shows how to use this function.
spark-sql> SELECT t.col, contains(t.col,'text') > FROM VALUES > ('Hello Kontext!'), > ('Hello Context!') > AS t(col); Hello Kontext! true Hello Context! true
Both records return true as 'text' exists in both values.
Use instr function
Function instr(str, substr)
returns the (1-based) index of the first occurrence of substr
in str
. Thus we could check the the returned value to decide whether the substring exists in the string.
Code snippet
spark-sql> SELECT t.col, case when instr(t.col,'text')>0 then true else false end > FROM VALUES > ('Hello Kontext!'), > ('Hello Contt!') > AS t(col); Hello Kontext! true Hello Contt! false
Filter data
We can directly use these two functions to filter data to return rows that contains certain texts.
spark-sql> SELECT t.col > FROM VALUES > ('Hello Kontext!'), > ('Hello Contt!') > AS t(col) > WHERE instr(t.col,'text')>0; Hello Kontext! spark-sql> SELECT t.col > FROM VALUES > ('Hello Kontext!'), > ('Hello Contt!') > AS t(col) > WHERE contains(t.col,'text'); Hello Kontext!
Both SQL statements only return the first record as it has 'text' in column 'col
'.
infoThese functions can also be directly used in Spark DataFrame APIs. For example,
df.where(contains('col', 'text') == False).show()
copyright
This page is subject to Site terms.
comment Comments
No comments yet.