Skip to content Skip to sidebar Skip to footer

Pandas: How To Use Slicing For Mixed-type Multi-indices In Python3?

As I noted in this partially related question, it is not possible to sort mixed-type sequences anymore: # Python3.6 sorted(['foo', 'bar', 10, 200, 3]) # => TypeError: '<' not

Solution 1:

This is the best I was able to come up with. A solution in three steps:

  • Stringify the multi-index in a way that the lex-sorting preserves the old mixed-type sorting from python2. For example, ints can be prepended with enough 0s.
  • Sort the table.
  • Use the same stringification when accessing the table with slices.

In code this reads as follows (complete example):

import numpy as np
import pandas as pd 

# Stringify whatever needs to be converted.# In this example: only ints are stringified.deftoString(x):
    ifisinstance(x,int):
        x = '%03d' % x
    return x
# Stringify an index tuple.defidxToString(idx):
    ifisinstance(idx, tuple):
        idx = list(idx)
        for i,x inenumerate(idx):
            idx[i] = toString(x)
        returntuple(idx)
    else:
        return toString(idx)
# Replacement for pd.IndexSliceclassIndexSlice(object):
    @staticmethoddef_toString(arg):
        ifisinstance(arg, slice):
            arg = slice(toString(arg.start),
                        toString(arg.stop),
                        toString(arg.step))
        else:
            arg = toString(arg)
        return arg

    def__getitem__(self, arg):
        ifisinstance(arg, tuple):
            returntuple(map(self._toString, arg))
        else:
            return self._toString(arg)

# Build the table.
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
# 1) Stringify the index.
table.index = table.index.map(idxToString)
# 2) Sort the index.
table = table.sort_index()
# 3) Create an IndexSlice that applies the same#    stringification rules. (Replaces pd.IndexSlice)
idx = IndexSlice()
# Now, the table rows can be accessed as usual.
table.loc[idx[10],:]
table.loc[idx[:10],:]
table.loc[idx[:'bar',:],:]
table.loc[idx[:,:2],:]

This is not very beautiful, but it fixes the slice-based access of the table data which was broken after upgrading to python3. I'm glad to read better suggestions if you folks have any.

Solution 2:

This is a second solution I came up with. It is nicer than my previous suggestion insofar that it does not alter the index values of the lex-sorted table. Here, I temporarily convert the non-string indices before sorting the table, but I de-stringify these indices after sorting.

The solution works because pandas naturally can deal with mixed-type indices. It appears that only the string-based subset of indices needs to be lex-sorted. (Pandas internally uses a so called Categorical object that appears to distinguish between strings and other types on its own.)

import numpy as np
import pandas as pd

defstringifiedSortIndex(table):
    # 1) Stringify the index.
    _stringifyIdx = _StringifyIdx()
    table.index = table.index.map(_stringifyIdx)
    # 2) Sort the index.
    table = table.sort_index()
    # 3) Destringify the sorted table.
    _stringifyIdx.revert = True
    table.index = table.index.map(_stringifyIdx)
    # Return table and IndexSlice together.return table

class_StringifyIdx(object):
    def__init__(self):
        self._destringifyMap = dict()
        self.revert = Falsedef__call__(self, idx):
        ifnot self.revert:
            return self._stringifyIdx(idx)
        else:
            return self._destringifyIdx(idx)

    # Stringify whatever needs to be converted.# In this example: only ints are stringified.    @staticmethoddef_stringify(x):
        ifisinstance(x,int):
            x = '%03d' % x
            destringify = intelse:
            destringify = lambda x: x
        return x, destringify

    def_stringifyIdx(self, idx):
        ifisinstance(idx, tuple):
            idx = list(idx)
            destr = [None]*len(idx)
            for i,x inenumerate(idx):
                idx[i], destr[i] = self._stringify(x)
            idx = tuple(idx)
            destr = tuple(destr)
        else:
            idx, destr = self._stringify(idx)
        if self._destringifyMap isnotNone:
            self._destringifyMap[idx] = destr
        return idx

    def_destringifyIdx(self, idx):
        if idx notin self._destringifyMap:
            raise ValueError(("Index to destringify has not been stringified ""this class instance. Index must not change ""between stringification and destringification."))
        destr = self._destringifyMap[idx]
        ifisinstance(idx, tuple):
            assert(len(destr)==len(idx))
            idx = tuple(d(i) for d,i inzip(destr, idx))
        else:
            idx = destr(idx)
        return idx


# Build the table.
index = [(10,3),(10,1),(2,2),('foo',4),('bar',5)]
index = pd.MultiIndex.from_tuples(index)
data = np.random.randn(len(index),2)
table = pd.DataFrame(data=data, index=index)
idx = pd.IndexSlice

table = stringifiedSortIndex(table)
print(table)

# Now, the table rows can be accessed as usual.
table.loc[idx[10],:]
table.loc[idx[:10],:]
table.loc[idx[:'bar',:],:]
table.loc[idx[:,:2],:]

# This works also for simply indexed table.
table = pd.DataFrame(data=data, index=[4,1,'foo',3,'bar'])
table = stringifiedSortIndex(table)
table[:'bar']

Post a Comment for "Pandas: How To Use Slicing For Mixed-type Multi-indices In Python3?"