Pandas: How To Use Slicing For Mixed-type Multi-indices In Python3?
Solution 1:
This is the best I was able to come up with. A solution in three steps:
- Stringify the multi-index in a way that the lex-sorting preserves the old mixed-type sorting from python2. For example,
int
s can be prepended with enough 0s. - Sort the table.
- Use the same stringification when accessing the table with slices.
In code this reads as follows (complete example):
import numpy as np
import pandas as pd
# Stringify whatever needs to be converted.# In this example: only ints are stringified.deftoString(x):
ifisinstance(x,int):
x = '%03d' % x
return x
# Stringify an index tuple.defidxToString(idx):
ifisinstance(idx, tuple):
idx = list(idx)
for i,x inenumerate(idx):
idx[i] = toString(x)
returntuple(idx)
else:
return toString(idx)
# Replacement for pd.IndexSliceclassIndexSlice(object):
@staticmethoddef_toString(arg):
ifisinstance(arg, slice):
arg = slice(toString(arg.start),
toString(arg.stop),
toString(arg.step))
else:
arg = toString(arg)
return arg
def__getitem__(self, arg):
ifisinstance(arg, tuple):
returntuple(map(self._toString, arg))
else:
return self._toString(arg)
# Build the table.
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
# 1) Stringify the index.
table.index = table.index.map(idxToString)
# 2) Sort the index.
table = table.sort_index()
# 3) Create an IndexSlice that applies the same# stringification rules. (Replaces pd.IndexSlice)
idx = IndexSlice()
# Now, the table rows can be accessed as usual.
table.loc[idx[10],:]
table.loc[idx[:10],:]
table.loc[idx[:'bar',:],:]
table.loc[idx[:,:2],:]
This is not very beautiful, but it fixes the slice-based access of the table data which was broken after upgrading to python3. I'm glad to read better suggestions if you folks have any.
Solution 2:
This is a second solution I came up with. It is nicer than my previous suggestion insofar that it does not alter the index values of the lex-sorted table. Here, I temporarily convert the non-string indices before sorting the table, but I de-stringify these indices after sorting.
The solution works because pandas naturally can deal with mixed-type indices. It appears that only the string-based subset of indices needs to be lex-sorted. (Pandas internally uses a so called Categorical
object that appears to distinguish between strings and other types on its own.)
import numpy as np
import pandas as pd
defstringifiedSortIndex(table):
# 1) Stringify the index.
_stringifyIdx = _StringifyIdx()
table.index = table.index.map(_stringifyIdx)
# 2) Sort the index.
table = table.sort_index()
# 3) Destringify the sorted table.
_stringifyIdx.revert = True
table.index = table.index.map(_stringifyIdx)
# Return table and IndexSlice together.return table
class_StringifyIdx(object):
def__init__(self):
self._destringifyMap = dict()
self.revert = Falsedef__call__(self, idx):
ifnot self.revert:
return self._stringifyIdx(idx)
else:
return self._destringifyIdx(idx)
# Stringify whatever needs to be converted.# In this example: only ints are stringified. @staticmethoddef_stringify(x):
ifisinstance(x,int):
x = '%03d' % x
destringify = intelse:
destringify = lambda x: x
return x, destringify
def_stringifyIdx(self, idx):
ifisinstance(idx, tuple):
idx = list(idx)
destr = [None]*len(idx)
for i,x inenumerate(idx):
idx[i], destr[i] = self._stringify(x)
idx = tuple(idx)
destr = tuple(destr)
else:
idx, destr = self._stringify(idx)
if self._destringifyMap isnotNone:
self._destringifyMap[idx] = destr
return idx
def_destringifyIdx(self, idx):
if idx notin self._destringifyMap:
raise ValueError(("Index to destringify has not been stringified ""this class instance. Index must not change ""between stringification and destringification."))
destr = self._destringifyMap[idx]
ifisinstance(idx, tuple):
assert(len(destr)==len(idx))
idx = tuple(d(i) for d,i inzip(destr, idx))
else:
idx = destr(idx)
return idx
# Build the table.
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
idx = pd.IndexSlice
table = stringifiedSortIndex(table)
print(table)
# Now, the table rows can be accessed as usual.
table.loc[idx[10],:]
table.loc[idx[:10],:]
table.loc[idx[:'bar',:],:]
table.loc[idx[:,:2],:]
# This works also for simply indexed table.
table = pd.DataFrame(data=data, index=[4,1,'foo',3,'bar'])
table = stringifiedSortIndex(table)
table[:'bar']
Post a Comment for "Pandas: How To Use Slicing For Mixed-type Multi-indices In Python3?"