Pandas DataFrame aggregate a list column to a set

This code snippet shows you how to group a pandas DataFrame and then aggregate a column with list or array type to a set (with duplicates removed) or a list. To implement it, we can first expode the column (list type) and then use `groupby` to create a grouped DataFrame and then aggregate using `set` or `list` or a combination of both. ### Input ``` category users 0 A [1, 2] 1 B [3, 4] 2 C [5, 6, 7] 3 A [1, 8, 1] 4 B [1, 6, 9] ``` ### Output ``` category users_set users_list 0 A {8, 1, 2} [8, 1, 2] 1 B {1, 3, 4, 6, 9} [1, 3, 4, 6, 9] 2 C {5, 6, 7} [5, 6, 7] ```

Kontext 0 517 0.75 index 8/10/2023

Code description

Input

      category      users
    0        A     [1, 2]
    1        B     [3, 4]
    2        C  [5, 6, 7]
    3        A  [1, 8, 1]
    4        B  [1, 6, 9]

Output

      category        users_set       users_list
    0        A        {8, 1, 2}        [8, 1, 2]
    1        B  {1, 3, 4, 6, 9}  [1, 3, 4, 6, 9]
    2        C        {5, 6, 7}        [5, 6, 7]

Code snippet

    import pandas as pd
    
    data = {
        "category": ['A', 'B', 'C', 'A', 'B'],
        "users": [
            [1,2],
            [3,4],
            [5,6,7],
            [1,8,1],
            [1,6,9]
        ]
    }
    
    df = pd.DataFrame(data)
    print(df)
    
    # Expode the array/list column
    df = df.explode('users')
    print(df)
        
    # Aggregate and consolidate users of each category into a set
    df = df.groupby('category')['users'].agg(
        users_set = lambda x: set(x),
        users_list = lambda x: list(set(x))
    ).reset_index()
    print(df)

Code description

Input

Output

Code snippet

In this article