Manipulating Pandas Columns
I have some data (up to Event) and expected output (Key, Time) as follows: +----------+------------+-------+-----+------+ | Location | Date | Event | Key | Time | +----------
Solution 1:
I do not think you need create the Key
here
df['Time']=df.groupby(['Location','Event']).Date.\transform(lambdax :(x.iloc[-1]-x.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]dfOut[107]:LocationDateEventKeyTime0i22019-03-02 1aNaT1i22019-03-02 1aNaT2i22019-03-02 1aNaT3i22019-03-04 1a2days4i22019-03-15 2b0days5i92019-02-22 2c0days6i92019-03-10 3dNaT7i92019-03-10 3d0days8s82019-04-22 1eNaT9s82019-04-25 1eNaT10s82019-04-28 1e6days11t142019-05-13 3f0days
Solution 2:
A vectorized approach
df['Date'] = pd.to_datetime(df['Date'])
df['diff'] = df['Key'].ne(df['Key'].shift(-1).ffill()).astype(int)
x = df.groupby(['Location','Event'])['Date'].transform(np.ptp)
df.loc[df['diff'] == 1, 'date_diff'] = x
df
Location Date Event Key Time diff date_diff
1 i2 2019-03-02 1 a 0 NaT
2 i2 2019-03-02 1 a 0 NaT
3 i2 2019-03-02 1 a 0 NaT
4 i2 2019-03-04 1 a 2 1 2 days
5 i2 2019-03-15 2 b 0 1 0 days
6 i9 2019-02-22 2 c 0 1 0 days
7 i9 2019-03-10 3 d 0 NaT
8 i9 2019-03-10 3 d 0 1 0 days
9 s8 2019-04-22 1 e 0 NaT
10 s8 2019-04-25 1 e 0 NaT
11 s8 2019-04-28 1 e 6 1 6 days
12 t14 2019-05-13 3 f 0 NaT
Post a Comment for "Manipulating Pandas Columns"