Pandas DataFrame groupBy and then COUNT DISTINCT

Kontext Kontext event 2023-08-09 visibility 438
more_vert

Code description

This code snippet shows you how to group a pandas DataFrame  via certain column and then do a distinct count of unique values in another column.

It is similar as COUNT DISTINCT aggregation functions in SQL. It also sort the values by the group by column.


Example output:

   category  value  user  group-count
0         A      0     5            7
80        A     80     4            7
70        A     70     7            7
60        A     60    10            7
50        A     50     9            7
..      ...    ...   ...          ...
29        J     29     9            7
19        J     19     9            7
9         J      9     9            7
89        J     89     8            7
99        J     99     7            7

[100 rows x 4 columns]

Code snippet

import pandas as pd
import random

categories = []
users = []
values = []
for i in range(0, 100):
    categories.append(chr(i % 10+65))
    values.append(i)
    users.append(random.randint(1, 10))

df = pd.DataFrame({'category': categories, 'value': values, 'user': users})
print(df)

df['group-count'] = df.groupby(by='category')['user'].transform('nunique')

print(df.sort_values(by=['category']))
More from Kontext
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts