Skip to content Skip to sidebar Skip to footer

Get Lists In List With Names Of Duplicate Columns By Values

I have data frame: import pandas as pd data = [[101, 1, 2, 10, 3, 2, 3, 1], [5,5, 5, 5, 5, 5, 5, 5], [30, 3, 7, 14, 10, 7, 10, 2], [11, 2, 6, 15, 20, 6, 20, 11]] df = pd.DataFra

Solution 1:

IIUC

l=df.T.reset_index().groupby(df.index.tolist())['index'].agg(list).loc[lambda x : x.str.len()>=2].values.tolist()
[['tab', 'box'], ['simm', 'simm']]

Solution 2:

Looks like you need to compare every pair of columns. So broadcast is an idea:

# extract the numpy arrayvalues= df.to_numpy()

# compare columns by columns
rows, cols = np.where(np.triu((values[:,:,None] ==values[:,None, :]).all(0), 1))

# output:
[df.columns[[r,c]].valuesfor r,c in zip(rows,cols)]

Output:

[array(['tab', 'box'], dtype=object), array(['simm', 'simm'], dtype=object)]

Solution 3:

res = df.T.loc[df.T.duplicated(keep=False)]
pairs = res.sort_values(res.columns.tolist()).index
[ent.tolist() for ent in np.split(pairs,2)]
[['tab', 'box'], ['simm', 'simm']]

Post a Comment for "Get Lists In List With Names Of Duplicate Columns By Values"