Reshape A Pandas Dataframe
Solution 1:
You can use lreshape
, for column id
numpy.repeat
:
a = [col for col in df.columns if'A'in col]
b = [col for col in df.columns if'B'in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})
df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index)) + 1print (df1)
A B id
012115612910133424782511122
EDIT:
lreshape
is currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).
Possible solution is merging all 3 functions to one - maybe melt
, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.
Solution 2:
The pd.wide_to_long
function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.
The only issue with pd.wide_to_long
is that it must have an identification variable, i
, unlike melt
. reset_index
is used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.
df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
.reset_index()[['A', 'B', 'id']]
A B id
012115612910133424782511122
Solution 3:
I solved this in 3 steps:
- Make a new dataframe
df2
holding only the data you want to be added to the initial dataframedf
. - Delete the data from
df
that will be added below (and that was used to makedf2
. - Append
df2
todf
.
Like so:
# step 1: create new dataframedf2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']
# step 2: delete that data from originaldf = df.drop(["A1", "B1"], 1)
# step 3: appenddf = df.append(df2, ignore_index=True)
Note how when you do df.append()
you need to specify ignore_index=True
so the new columns get appended to the index rather than keep their old index.
Your end result should be your original dataframe with the data rearranged like you wanted:
In[16]: dfOut[16]:
AB012156291033447851112
Solution 4:
Use pd.concat()
like so:
#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)
# Concatenate
pd.concat([df_1, df_2])
Post a Comment for "Reshape A Pandas Dataframe"