Skip to content Skip to sidebar Skip to footer

Pandas Groupby Count Non-null Values As Percentage

Given this dataset, I would like to count missing, NaN, values: df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan], 'Team' : ['one', 'one', 'tw

Solution 1:

You can take the mean of the notnull Boolean DataFrame:

In [11]: df.notnull()
Out[11]:
       A      C      D  Team
0TrueTrueFalseTrue1FalseTrueFalseTrue2TrueTrueTrueTrue3TrueTrueTrueTrue4TrueTrueTrueTrue5FalseTrueTrueTrue6TrueFalseFalseTrue7FalseTrueFalseTrueIn [12]: df.notnull().mean()
Out[12]:
A       0.625
C       0.875
D       0.500
Team    1.000
dtype: float64

and with the groupby:

In [13]: df.groupby("Team").apply(lambda x: x.notnull().mean())
Out[13]:
              A         C    D  Team
Team
one    0.6666670.6666670.01.0
three  0.5000001.0000000.51.0
two    0.6666671.0000001.01.0

It may be faster to do this without an apply using set_index first:

In [14]: df.set_index("Team").notnull().groupby(level=0).mean()
Out[14]:
              A         C    D
Team
one    0.6666670.6666670.0
three  0.5000001.0000000.5
two    0.6666671.0000001.0

Solution 2:

Base on your own code add div(df.groupby('Team').size(),0)

df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]: 
              A         C    D
Team                          
one    0.6666670.6666670.0
three  0.5000001.0000000.5
two    0.6666671.0000001.0

Post a Comment for "Pandas Groupby Count Non-null Values As Percentage"