Skip to content Skip to sidebar Skip to footer

How To Compare Two Dataframes On A Column And Replace With Other Column Value

I am having two data frames that are df1 and df2 id first last size A 1978-01-01 1979-01-01 2 B 2000-01-01 2000-01-01 1 C 1998-01-01 2000-01-01 3 D

Solution 1:

Using Merge and combine_first:

df = df1.merge(df2,how='outer')
df['id'] = df['token'].combine_first(df['id'] )
df.drop('token',inplace=True,axis=1)

Another way is to use replace with dictionary of df2.values, here the df1 dataframe changes.:

df1.id.replace(dict(df2.values),inplace=True)

idfirstlastsize0ZA.001978-01-01  1979-01-01  21As.112000-01-01  2000-01-01  12SD.341998-01-01  2000-01-01  33D1998-01-01  1998-01-01  14E1999-01-01  2000-01-01  2

Solution 2:

Use map and fillna:

df1['id'] = df1['id'].map(df2.set_index('id')['token']).fillna(df1['id'])
df1

Output:

idfirstlastsize0ZA.001978-01-01  1979-01-01     21As.112000-01-01  2000-01-01     12SD.341998-01-01  2000-01-01     33D1998-01-01  1998-01-01     14E1999-01-01  2000-01-01     2

You can use map with a series as an argument.

Solution 3:

If you do not wish to merge your DataFrame, you could use apply function to solve this. Change your small DataFrame to dictionary and map it to the other DataFrame.

from io import StringIO #used to get string to dfimport pandas as pd

id_ =list('ABC')
token = 'ZA.00 As.11 SD.34'.split()
dt = pd.DataFrame(list(zip(id_,token)),columns=['id','token'])

a ='''
id first last size
A 1978-01-01 1979-01-01 2
B 2000-01-01 2000-01-01 1
C 1998-01-01 2000-01-01 3
D 1998-01-01 1998-01-01 1
E 1999-01-01 2000-01-01 2
'''

df =pd.read_csv(StringIO(a), sep=' ')

# This last two lines are all you need
mp= {x:y for x,y inzip(dt.id.tolist(),dt.token.tolist())}

df.id.apply(lambda x: mp[x] if x in mp.keys() else x)

Post a Comment for "How To Compare Two Dataframes On A Column And Replace With Other Column Value"