Pandas DataFrame aggregate a list column to a set
Code description
This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use groupby
to create a grouped DataFrame and then aggregate using set
or list
or a combination of both.
Input
category users
0 A [1, 2]
1 B [3, 4]
2 C [5, 6, 7]
3 A [1, 8, 1]
4 B [1, 6, 9]
Output
category users_set users_list
0 A {8, 1, 2} [8, 1, 2]
1 B {1, 3, 4, 6, 9} [1, 3, 4, 6, 9]
2 C {5, 6, 7} [5, 6, 7]
Code snippet
import pandas as pd data = { "category": ['A', 'B', 'C', 'A', 'B'], "users": [ [1,2], [3,4], [5,6,7], [1,8,1], [1,6,9] ] } df = pd.DataFrame(data) print(df) # Expode the array/list column df = df.explode('users') print(df) # Aggregate and consolidate users of each category into a set df = df.groupby('category')['users'].agg( users_set = lambda x: set(x), users_list = lambda x: list(set(x)) ).reset_index() print(df)
info Last modified by Kontext 2 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.