Pandas - How To Pick Events By Date And Create A New Ordered Dataframe. - Surgery Patients
Solution 1:
Edit: I've changed the filter criterion below to at least two different OPs of interest.
Here is one way to do this. I've changed your data somewhat for testing purposes.
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 999, 3, 1, 999, 2],
'OP_code': ['V011', 'A082', 'V011', 'V011', 'A651', 'V014', 'A263'],
'OP_date': ['2014-12-12', '2014-06-23', '2014-08-07', '2014-09-12',
'2018-10-03', '2018-07-06', '2018-05-18']})
df.set_index('ID', inplace=True)
display(df)
OP_codeOP_dateID1V0112014-12-122A0822014-06-23999V0112014-08-073V0112014-09-121A6512018-10-03999V0142018-07-062A2632018-05-18
First we should transform the data so that there is exactly one row per patient, collecting the data from multiple OPs in lists:
df_patients = pd.pivot_table(df, index=df.index, aggfunc=list)
display(df_patients)
OP_code OP_date
ID
1[V011, A651][2014-12-12, 2018-10-03]2[A082, A263][2014-06-23, 2018-05-18]3[V011][2014-09-12]999[V011, V014][2014-08-07, 2018-07-06]
Now given a list of the OP codes that correspond to the implants you're interested in, we can loop through the rows of this DataFrame to create an index of only those patients that had at least two different OPs of interest. Then we can filter the data according to this new index.
implant_codes = {'V011', 'V014'}
implant_index = []
for i in df_patients.index:
"""EDIT: filter criterion tightened to at least two different
relevant OPs, i.e. the intersection of the implant_codes
list with the patient's OP list has at least two elements."""iflen(implant_codes.intersection(df_patients.OP_code[i])) >= 2:
implant_index.append(i)
df_implants = df_patients.filter(implant_index, axis=0)
display(df_implants)
OP_code OP_date
ID
999[V011, V014][2014-08-07, 2018-07-06]
You can access data elements here by a combination of the indexing syntax for DataFrames and lists, e.g. df_implants.loc[999, 'OP_date'][0]
yields the first OP date of patient 999: '2014-08-07'
I would not recommend creating a separate column for each OP. You could try something like this:
df_implants[['OP_date_1', 'OP_date_2']] = pd.DataFrame(df_implants.OP_date.values.tolist(),
index=df_implants.index)
display(df_implants)
OP_codeOP_dateOP_date_1OP_date_2ID999 [V011, V014] [2014-08-07, 2018-07-06] 2014-08-07 2018-07-06
However, this approach will run into trouble in practice, due to the fact that the number of OPs varies across patients. That's why I think the list representation given above is more natural and easier to handle.
Post a Comment for "Pandas - How To Pick Events By Date And Create A New Ordered Dataframe. - Surgery Patients"