Skip to content Skip to sidebar Skip to footer

Python Pandas From Itemset To Dataframe

What is the more scalable way to go from an itemset list:: itemset = [['a', 'b'], ['b', 'c', 'd'], ['a', 'c', 'd', 'e'], ['d'], ['a', 'b

Solution 1:

You can use get_dummies:

print (pd.DataFrame(itemset))
   01230  a     b  NoneNone1  b     c     d  None2  a     c     d     e
3  d  NoneNoneNone4  a     b     c  None5  a     b     c     d
df1 = (pd.get_dummies(pd.DataFrame(itemset), prefix='', prefix_sep='' ))
print (df1)
     a    b    d    b    c    c    d    d    e
01.00.00.01.00.00.00.00.00.010.01.00.00.01.00.01.00.00.021.00.00.00.01.00.01.00.01.030.00.01.00.00.00.00.00.00.041.00.00.01.00.01.00.00.00.051.00.00.01.00.01.00.01.00.0print (df1.groupby(df1.columns, axis=1).sum().astype(int))
   a  b  c  d  e
011000101110210111300010411100511110

Solution 2:

Here's an almost vectorized approach -

items = np.concatenate(itemset)           
col_idx = np.fromstring(items, dtype=np.uint8)-97

lens = np.array([len(item) for item in itemset])
row_idx = np.repeat(np.arange(lens.size),lens)
out = np.zeros((lens.size,lens.max()+1),dtype=int)
out[row_idx,col_idx] = 1   

df = pd.DataFrame(out,columns=np.unique(items))

The last line could be replaced by something like this and could be more performant -

df = pd.DataFrame(out,columns=items[np.unique(col_idx,return_index=True)[1]])

Post a Comment for "Python Pandas From Itemset To Dataframe"