Decode Characters Pandas
Below is a sample of my DF ROLE NAME GESELLSCHAFTER DUPONT DUPONT GESCHäFTSFüHRER DUPONT DUPONT KOMPLEMENTäR DU
Solution 1:
You can pass regex=True
to replace
:
# the included dic seems to have `A` instead of 'Ã'
dic ={'ü':'U', 'ä':'A'}
df['ROLE'] = df['ROLE'].replace(dic, regex=True)
Output:
ROLE NAME
0 GESELLSCHAFTER DUPONT DUPONT
1 GESCHAFTSFUHRER DUPONT DUPONT
2 KOMPLEMENTAR DUPONT DUPONT
3 GESELLSCHAFTER DUPONT DUPONT
4 KOMPLEMENTAR DUPONT DUPONT
Solution 2:
This solution is quite long and might not work well on a large dataset, first decompose using unicodedata
then encode to ascii
to remove the accents and decode to utf-8
from unicodedata import normalize
df.ROLE.apply(lambda x: normalize('NFD', x).encode(
'ascii', 'ignore').decode('utf-8-sig'))
# 0 AKTIONAR# 1 AKTIONAR# 2 AUFSICHTSRAT# 3 AUSABENDE PERSON# 4 AUSUBENDE PERSON# 5 DEFAULT KEY# 6 GESCHAFTSFAHRENDER DIREKTOR# 7 GESCHAFTSFAHRER# Name: ROLE, dtype: object
Post a Comment for "Decode Characters Pandas"