Pandas DataFrame groupBy and then COUNT DISTINCT
Code description
This code snippet shows you how to group a pandas DataFrame via certain column and then do a distinct count of unique values in another column.
It is similar as COUNT DISTINCT aggregation functions in SQL. It also sort the values by the group by column.
Example output:
category value user group-count
0 A 0 5 7
80 A 80 4 7
70 A 70 7 7
60 A 60 10 7
50 A 50 9 7
.. ... ... ... ...
29 J 29 9 7
19 J 19 9 7
9 J 9 9 7
89 J 89 8 7
99 J 99 7 7
[100 rows x 4 columns]
Code snippet
import pandas as pd import random categories = [] users = [] values = [] for i in range(0, 100): categories.append(chr(i % 10+65)) values.append(i) users.append(random.randint(1, 10)) df = pd.DataFrame({'category': categories, 'value': values, 'user': users}) print(df) df['group-count'] = df.groupby(by='category')['user'].transform('nunique') print(df.sort_values(by=['category']))
copyright
This page is subject to Site terms.
comment Comments
No comments yet.