Code description
This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use groupby
to create a grouped DataFrame and then aggregate using set
or list
or a combination of both.
Input
category users
0 A [1, 2]
1 B [3, 4]
2 C [5, 6, 7]
3 A [1, 8, 1]
4 B [1, 6, 9]
Output
category users_set users_list
0 A {8, 1, 2} [8, 1, 2]
1 B {1, 3, 4, 6, 9} [1, 3, 4, 6, 9]
2 C {5, 6, 7} [5, 6, 7]
Code snippet
import pandas as pd
data = {
"category": ['A', 'B', 'C', 'A', 'B'],
"users": [
[1,2],
[3,4],
[5,6,7],
[1,8,1],
[1,6,9]
]
}
df = pd.DataFrame(data)
print(df)
# Expode the array/list column
df = df.explode('users')
print(df)
# Aggregate and consolidate users of each category into a set
df = df.groupby('category')['users'].agg(
users_set = lambda x: set(x),
users_list = lambda x: list(set(x))
).reset_index()
print(df)