Skip to content Skip to sidebar Skip to footer

Manipulating Pandas Columns

I have some data (up to Event) and expected output (Key, Time) as follows: +----------+------------+-------+-----+------+ | Location | Date | Event | Key | Time | +----------

Solution 1:

I do not think you need create the Key here

df['Time']=df.groupby(['Location','Event']).Date.\transform(lambdax :(x.iloc[-1]-x.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]dfOut[107]:LocationDateEventKeyTime0i22019-03-02     1aNaT1i22019-03-02     1aNaT2i22019-03-02     1aNaT3i22019-03-04     1a2days4i22019-03-15     2b0days5i92019-02-22     2c0days6i92019-03-10     3dNaT7i92019-03-10     3d0days8s82019-04-22     1eNaT9s82019-04-25     1eNaT10s82019-04-28     1e6days11t142019-05-13     3f0days

Solution 2:

A vectorized approach

df['Date'] = pd.to_datetime(df['Date'])
df['diff'] = df['Key'].ne(df['Key'].shift(-1).ffill()).astype(int)
x = df.groupby(['Location','Event'])['Date'].transform(np.ptp)
df.loc[df['diff'] == 1, 'date_diff'] = x
df

Location    Date    Event   Key Time    diff    date_diff
1   i2  2019-03-02  1   a       0   NaT
2   i2  2019-03-02  1   a       0   NaT
3   i2  2019-03-02  1   a       0   NaT
4   i2  2019-03-04  1   a   2   1   2 days
5   i2  2019-03-15  2   b   0   1   0 days
6   i9  2019-02-22  2   c   0   1   0 days
7   i9  2019-03-10  3   d       0   NaT
8   i9  2019-03-10  3   d   0   1   0 days
9   s8  2019-04-22  1   e       0   NaT
10  s8  2019-04-25  1   e       0   NaT
11  s8  2019-04-28  1   e   6   1   6 days
12  t14 2019-05-13  3   f       0   NaT

Post a Comment for "Manipulating Pandas Columns"