Skip to content Skip to sidebar Skip to footer

Reshape A Pandas Dataframe

suppose a dataframe like this one: df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1']) I would like to have a dataframe which looks like: wha

Solution 1:

You can use lreshape, for column idnumpy.repeat:

a = [col for col in df.columns if'A'in col]
b = [col for col in df.columns if'B'in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1print (df1)
    A   B  id
012115612910133424782511122

EDIT:

lreshape is currently undocumented, but it is possible it might be removed(with pd.wide_to_long too).

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

Solution 2:

The pd.wide_to_long function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

The only issue with pd.wide_to_long is that it must have an identification variable, i, unlike melt. reset_index is used to create a this uniquely identifying column, which is dropped later. I think this might get corrected in the future.

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
012115612910133424782511122

Solution 3:

I solved this in 3 steps:

  1. Make a new dataframe df2 holding only the data you want to be added to the initial dataframe df.
  2. Delete the data from df that will be added below (and that was used to make df2.
  3. Append df2 to df.

Like so:

# step 1: create new dataframedf2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from originaldf = df.drop(["A1", "B1"], 1)

# step 3: appenddf = df.append(df2, ignore_index=True)

Note how when you do df.append() you need to specify ignore_index=True so the new columns get appended to the index rather than keep their old index.

Your end result should be your original dataframe with the data rearranged like you wanted:

In[16]: dfOut[16]:
    AB012156291033447851112

Solution 4:

Use pd.concat() like so:

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])

Post a Comment for "Reshape A Pandas Dataframe"