Skip to content Skip to sidebar Skip to footer

Grouping By Multiple Columns To Find Duplicate Rows Pandas

I have a df id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2 I want to group by val1 and

Solution 1:

You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:

df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
   id  val1  val2
0   1   1.1   2.2
1   1   1.1   2.2
3   3   8.8   6.2
4   4   1.1   2.2
5   5   8.8   6.2

Detail:

print (df.duplicated(subset=['val1','val2'], keep=False))
0True1True2False3True4True5True
dtype: bool

Post a Comment for "Grouping By Multiple Columns To Find Duplicate Rows Pandas"