Skip to content Skip to sidebar Skip to footer

Pandas - How To Pick Events By Date And Create A New Ordered Dataframe. - Surgery Patients

I am a surgeon looking at neurosurgery. I have a dataframe of 600,000 records, and 70 columns with about 7 date columns for various events that happened to patients in a hospital s

Solution 1:

Edit: I've changed the filter criterion below to at least two different OPs of interest.

Here is one way to do this. I've changed your data somewhat for testing purposes.

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 999, 3, 1, 999, 2],
                   'OP_code': ['V011', 'A082', 'V011', 'V011', 'A651', 'V014', 'A263'], 
                   'OP_date': ['2014-12-12', '2014-06-23', '2014-08-07', '2014-09-12', 
                               '2018-10-03', '2018-07-06', '2018-05-18']})
df.set_index('ID', inplace=True)
display(df)
OP_codeOP_dateID1V0112014-12-122A0822014-06-23999V0112014-08-073V0112014-09-121A6512018-10-03999V0142018-07-062A2632018-05-18

First we should transform the data so that there is exactly one row per patient, collecting the data from multiple OPs in lists:

df_patients = pd.pivot_table(df, index=df.index, aggfunc=list)
display(df_patients)
     OP_code        OP_date
ID      
1[V011, A651][2014-12-12, 2018-10-03]2[A082, A263][2014-06-23, 2018-05-18]3[V011][2014-09-12]999[V011, V014][2014-08-07, 2018-07-06]

Now given a list of the OP codes that correspond to the implants you're interested in, we can loop through the rows of this DataFrame to create an index of only those patients that had at least two different OPs of interest. Then we can filter the data according to this new index.

implant_codes = {'V011', 'V014'}

implant_index = []
for i in df_patients.index:
    """EDIT: filter criterion tightened to at least two different 
       relevant OPs, i.e. the intersection of the implant_codes 
       list with the patient's OP list has at least two elements."""iflen(implant_codes.intersection(df_patients.OP_code[i])) >= 2: 
        implant_index.append(i)

df_implants = df_patients.filter(implant_index, axis=0)
display(df_implants)
     OP_code       OP_date
ID      
999[V011, V014][2014-08-07, 2018-07-06]

You can access data elements here by a combination of the indexing syntax for DataFrames and lists, e.g. df_implants.loc[999, 'OP_date'][0] yields the first OP date of patient 999: '2014-08-07'

I would not recommend creating a separate column for each OP. You could try something like this:

df_implants[['OP_date_1', 'OP_date_2']] = pd.DataFrame(df_implants.OP_date.values.tolist(), 
                                                       index=df_implants.index)
display(df_implants)
OP_codeOP_dateOP_date_1OP_date_2ID999  [V011, V014]  [2014-08-07, 2018-07-06]  2014-08-07  2018-07-06

However, this approach will run into trouble in practice, due to the fact that the number of OPs varies across patients. That's why I think the list representation given above is more natural and easier to handle.

Post a Comment for "Pandas - How To Pick Events By Date And Create A New Ordered Dataframe. - Surgery Patients"