Skip to content Skip to sidebar Skip to footer

Check On Pandas Dataframe

I have a pandas dataframe composed by 3 columns. index start end value 0 0 37647 0 1 37648 37846 1 2 37847 42874 0 3 42875 43049

Solution 1:

This could probably be cleaned up a bit, but should work.

Code:

# FIRST CHECKdf['end'][df['end'].shift(-1) == df['start'].shift(-1)] = df['end'].shift(-1)
df.drop_duplicates('end', inplace = True)

# SECOND CHECKdf['end'][df['value'].shift(-1) == df['value']] = df['end'].shift(-1)
df['value'][df['value'].shift(-1) == df['value']] = (df['value'] + df['value'].shift(-1)).fillna(0).astype(int)
df.drop_duplicates('end', inplace = True)

Output:

startendvalue0037647013764837846123784742874034287543049144305051352055135351665-1651666552590

Solution 2:

Using numpy where you can do it like this:

import numpy as np

inp = np.where(df.start == df.end)[0]
droplist = []
save = 0
j = 0
for i in range(len(inp)):
    if inp[i] > 0:
        if inp[i]-inp[i-1] == 1:
            j += 1
            save += 1
            df.loc[inp[i]-1-j,"end"] += save
        else:
            j = 0
            save = 0
            df.loc[inp[i]-1,"end"] += 1
        droplist.append(inp[i])
df = df.drop(droplist).reset_index(drop=True)

droplist = []
jnp = np.where(df.value == df.value.shift(-1))[0]
for jj in jnp:
    df.loc[jj,"end"] = df.loc[jj+1,"end"]
    droplist.append(jj+1)
df = df.drop(droplist).reset_index(drop=True)

There might be a more pythonic way without for-loops using numpy though.

EDIT: Fixed for consecutive rows.

Post a Comment for "Check On Pandas Dataframe"