Merging Dataframes
I have been struggling with this problem all day. I have two dataframes as follows: Dataframe 1 - Billboards Dataframe 2 I would like to merge Dataframe 2 with Dataframe 1 based
Solution 1:
I think you would need to calculate the similarity measure between the songs list in df1 and df2. I gave it a try by calculating cosine distance between the songs in df1 and df2 on randomly generated song list.
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(min_df=1)
Song1 = ["macarena bayside boys mix", "cant you hear my heart beat", "crying in the chapell", "you were on my mind"]
Song2 = ["cause im a man", "macarena", "beat from my heart"]
dist_dict = {}
match_dict = {}
for i in Song1 :
for j in Song2 :
tfidf = vect.fit_transform([i, j])
distance = ((tfidf * tfidf.T).A)[0,1]
if i in dist_dict.keys():
if dist_dict[i] < distance :
dist_dict[i] = distance
match_dict[i] = j
else :
dist_dict[i] = distance
Once you have the best match you can lookup the song ID in df2
Solution 2:
The easiest way to do it: 1. Make "Song" as an index column in both dataframes like
df1.set_index('Song', inplace=True)
df2.set_index('Song', inplace=True)
- Use join:
joined = df1.join(df2, how='inner')
Post a Comment for "Merging Dataframes"