Decode An Encoded Unicode String In Python
I need to decode a 'UNICODE' encoded string: >>> id = u'abcdß' >>> encoded_id = id.encode('utf-8') >>> encoded_id 'abcd\xc3\x9f' The problem I have is:
Solution 1:
You have UTF-8 encoded data (there is no such thing as UNICODE encoded data).
Encode the unicode value to Latin-1, then decode from UTF8:
encoded_id.encode('latin1').decode('utf8')
Latin 1 maps the first 255 unicode points one-on-one to bytes.
Demo:
>>> encoded_id = u'abcd\xc3\x9f'>>> encoded_id.encode('latin1').decode('utf8')
u'abcd\xdf'>>> print encoded_id.encode('latin1').decode('utf8')
abcdß
Post a Comment for "Decode An Encoded Unicode String In Python"