Pandas DataFrame aggregate a list column to a set

Kontext Kontext event 2023-08-10 visibility 442
more_vert

Code description

This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use groupby to create a grouped DataFrame and then aggregate using set or list or a combination of both.


Input

  category      users
0        A     [1, 2]
1        B     [3, 4]
2        C  [5, 6, 7]
3        A  [1, 8, 1]
4        B  [1, 6, 9]

Output

  category        users_set       users_list
0        A        {8, 1, 2}        [8, 1, 2]
1        B  {1, 3, 4, 6, 9}  [1, 3, 4, 6, 9]
2        C        {5, 6, 7}        [5, 6, 7]

Code snippet

import pandas as pd

data = {
    "category": ['A', 'B', 'C', 'A', 'B'],
    "users": [
        [1,2],
        [3,4],
        [5,6,7],
        [1,8,1],
        [1,6,9]
    ]
}

df = pd.DataFrame(data)
print(df)

# Expode the array/list column
df = df.explode('users')
print(df)
    
# Aggregate and consolidate users of each category into a set
df = df.groupby('category')['users'].agg(
    users_set = lambda x: set(x),
    users_list = lambda x: list(set(x))
).reset_index()
print(df)
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts