Python Pandas From Itemset To Dataframe
What is the more scalable way to go from an itemset list:: itemset = [['a', 'b'], ['b', 'c', 'd'], ['a', 'c', 'd', 'e'], ['d'], ['a', 'b
Solution 1:
You can use get_dummies
:
print (pd.DataFrame(itemset))
01230 a b NoneNone1 b c d None2 a c d e
3 d NoneNoneNone4 a b c None5 a b c d
df1 = (pd.get_dummies(pd.DataFrame(itemset), prefix='', prefix_sep='' ))
print (df1)
a b d b c c d d e
01.00.00.01.00.00.00.00.00.010.01.00.00.01.00.01.00.00.021.00.00.00.01.00.01.00.01.030.00.01.00.00.00.00.00.00.041.00.00.01.00.01.00.00.00.051.00.00.01.00.01.00.01.00.0print (df1.groupby(df1.columns, axis=1).sum().astype(int))
a b c d e
011000101110210111300010411100511110
Solution 2:
Here's an almost vectorized approach -
items = np.concatenate(itemset)
col_idx = np.fromstring(items, dtype=np.uint8)-97
lens = np.array([len(item) for item in itemset])
row_idx = np.repeat(np.arange(lens.size),lens)
out = np.zeros((lens.size,lens.max()+1),dtype=int)
out[row_idx,col_idx] = 1
df = pd.DataFrame(out,columns=np.unique(items))
The last line could be replaced by something like this and could be more performant -
df = pd.DataFrame(out,columns=items[np.unique(col_idx,return_index=True)[1]])
Post a Comment for "Python Pandas From Itemset To Dataframe"