Python - Remove Extended Ascii
Solution 1:
Encode the string to bytes and then decode back to ASCII:
data.encode().decode('ascii',errors='ignore')
# {"id":"xxx","timestamp":xxx,...}}
You can also use regular expressions to remove all characters outside of the outermost curly braces:
re.sub(r'^[^{]*(?={)|(?<=})[^}{]*(?={)|(?<=})[^}]*$', '', data)
The latter mechanism incidentally also removes the ASCII 'C' character that you do not want.
Solution 2:
import re
str='¾ïúÀï{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xxx"}}’ÂCº¾ïúÀï{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xx}}'
str=re.sub('[^\x00-\x7F]','',str)
print(str)
Should produce output as...
'{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xxx"}}C{"id":"xxx","timestamp":xxx,"payloadType":"xxx","payload":{"protocol":"xxx","zoneID":xxx,"zoneName":"xxx","eventType":"xx}}'
Solution 3:
what about something like:
import string
cleaned_string = ''forcharin ugly_string:
ifcharinstring.printable:
cleaned_string += char
This question also deals with a similar problem.
Solution 4:
If the garbage bytes do not contain opening curly braces, you can do something like this:
def decode_all(data):
decoder = JSONDecoder()
end_index = 0whiledata:
try:
data = data[data.index('{', end_index):]
except ValueError:
break
obj, end_index = decoder.raw_decode(data)
yield obj
Otherwise, without knowing what those garbage bytes can contain and whether or not your JSON is pure ASCII, I think the best solution would be to try parsing a JSON-encoded object out of your data over and over and skipping the garbage bytes implicitly:
from json import JSONDecoder
data = '''¾ïúÀï{"id":"123","timestamp":123,"payloadType":"123","payload":{"protocol":"123","zoneID":123,"zoneName":"123","eventType":"123"}}’ÂCº¾ïúÀï{"id":"123","timestamp":123,"payloadType":"123","payload":{"protocol":"123","zoneID":123,"zoneName":"123","eventType":"xx"}}'''
def decode_all(data):
decoder = JSONDecoder()
while data:
try:
obj, end_index = decoder.raw_decode(data)
data = data[end_index:]
yield obj
except ValueError:
end_index = None
start = data.find('{')
if start == -1:
break
elif end_index is None and start == 0:
start = 1
data = data[start:]
foroindecode_all(data):
print(o)
Solution 5:
I used exifread. In short, use .printable to get a string of the value. Here is the code to get the value of some select tags and pack them in a dictionary called context:
photo_file = '/absolute/path/photo.jpg'
f = open(latest_f, 'rb')
tags = exifread.process_file(f)
select_tags = ['Image Make', 'Image Model','EXIF DateTimeOriginal', 'EXIF ExposureTime']
context = dict()
for i, tag inenumerate(select_tags):
context[tag] = tags[select_tags[i]]*.printable*
print(context)
Post a Comment for "Python - Remove Extended Ascii"