This code snippet shows you how to define a function to split a string column to an array of strings using Python built-in split
function. It then explodes the array element from the split into using PySpark built-in explode
function.
Sample output
+----------+-----------------+--------------------+-----+
| category| users| users_array| user|
+----------+-----------------+--------------------+-----+
|Category A|user1,user2,user3|[user1, user2, us...|user1|
|Category A|user1,user2,user3|[user1, user2, us...|user2|
|Category A|user1,user2,user3|[user1, user2, us...|user3|
|Category B| user3,user4| [user3, user4]|user3|
|Category B| user3,user4| [user3, user4]|user4|
+----------+-----------------+--------------------+-----+