Skip to content Skip to sidebar Skip to footer

Decode An Encoded Unicode String In Python

I need to decode a 'UNICODE' encoded string: >>> id = u'abcdß' >>> encoded_id = id.encode('utf-8') >>> encoded_id 'abcd\xc3\x9f' The problem I have is:

Solution 1:

You have UTF-8 encoded data (there is no such thing as UNICODE encoded data).

Encode the unicode value to Latin-1, then decode from UTF8:

encoded_id.encode('latin1').decode('utf8')

Latin 1 maps the first 255 unicode points one-on-one to bytes.

Demo:

>>> encoded_id = u'abcd\xc3\x9f'>>> encoded_id.encode('latin1').decode('utf8')
u'abcd\xdf'>>> print encoded_id.encode('latin1').decode('utf8')
abcdß

Post a Comment for "Decode An Encoded Unicode String In Python"