This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use groupby
to create a grouped DataFrame and then aggregate using set
or list
or a combination of both.
Input
category users
0 A [1, 2]
1 B [3, 4]
2 C [5, 6, 7]
3 A [1, 8, 1]
4 B [1, 6, 9]
Output
category users_set users_list
0 A {8, 1, 2} [8, 1, 2]
1 B {1, 3, 4, 6, 9} [1, 3, 4, 6, 9]
2 C {5, 6, 7} [5, 6, 7]