Sort Dataframe Index That Has A String And Number
My df DataFrame index looks like this: Com_Lag_01 Com_Lag_02 Com_Lag_03 Com_Lag_04 Com_Lag_05 Com_Lag_06 Com_Lag_07 Com_Lag_08 Com_Lag_09 Com_Lag_10 Com_Lag_101 Com_Lag_102 Com_Lag
Solution 1:
One could try something like this, by performing a sort on a numbered version of the index
import pandas as pd
# Create a DataFrame example
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
# Add of a column containing a numbered version of the index
df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
# Perform sort of the rows
df.sort(['indexNumber'], ascending = [True], inplace = True)
# Deletion of the added column
df.drop('indexNumber', 1, inplace = True)
Edit 2017 - V1:
To avoid SettingWithCopyWarning:
df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])
Edit 2017 - V2 for Pandas Version 0.21.0
import pandas as pd
print(pd.__version__)
# Create a DataFrame example
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)
Solution 2:
Solution without new column with DataFrame.reindex
by index
of sorted Series
:
a = df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values()
print (a)
Com_Lag_1 1
Com_Lag_3 3
Com_Lag_5 5
Com_Lag_12 12
Com_Lag_24 24
dtype: int32
df = df.reindex(index=a.index)
print (df)
Age Year
Com_Lag_1 271991
Com_Lag_3 222001
Com_Lag_5 311997
Com_Lag_12 252004
Com_Lag_24 342009
But if duplicated values is necessary add new column:
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_12'])
print (df)
Age Year
Com_Lag_1 27 1991
Com_Lag_12 25 2004
Com_Lag_3 22 2001
Com_Lag_24 34 2009
Com_Lag_12 31 1997
df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
print (df)
Age Year
Com_Lag_1 27 1991
Com_Lag_3 22 2001
Com_Lag_12 25 2004
Com_Lag_12 31 1997
Com_Lag_24 34 2009
Solution 3:
Another solution is
df.sort_index(key=lambda x: (x.to_series().str[8:].astype(int)), inplace=True)
The 8 comes from the position where the numeric values start
Post a Comment for "Sort Dataframe Index That Has A String And Number"