Feedparser Fails During Script Run, But Can't Reproduce In Interactive Python Console
Solution 1:
Looks like the url that is giving you problem contains text with some encoding (such as latin-1, where 0xe2
would be "lowercase a with a circle on top" aka â
) without a proper content-type header (it should have a charset= parameter in Content-Type:
but doesn't).
If that is the case feedparser
cannot guess the encoding, tries the default (ascii
), and fails.
this part of feedparser's docs explains the issues in more detail.
Unfortunately there are no "magic bullets" to solve this general issue (due to bozos that break the XML rules). You could try catching this exception, and in the handler read the url's contents separately (use urllib2
) and try decoding them with various possible encodings -- then when you finally get a usable unicode object this way, feed that to feedparser.parse
(whose first arg can be a url, a file stream, or a unicode string with the data).
Solution 2:
With reference to the OP's comment: Try any url literal, such as u'myfeed.blah/xml' It should reproduce.
>>> from pprint import pprint as pp
>>> import feedparser
>>> d = feedparser.parse(u'myfeed.blah/xml')
>>> pp(d)
{'bozo': 1,
'bozo_exception': SAXParseException('not well-formed (invalid token)',),
'encoding': 'utf-8',
'entries': [],
'feed': {},
'namespaces': {},
'version': ''}
>>> d = feedparser.parse(u'http://myfeed.blah/xml')
>>> pp(d)
{'bozo': 1,
'bozo_exception': URLError(gaierror(11001, 'getaddrinfo failed'),),
'encoding': 'utf-8',
'entries': [],
'feed': {},
'version': None}
>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
>>> d['bozo']
0>>> d['feed']['title']
u'Sample Feed'>>> d = feedparser.parse(u"http://feedparser.org/docs/examples/atom10.xml")
>>> d['bozo']
0>>> d['feed']['title']
u'Sample Feed'
>>>
Please stop thrashing about; provide a URL that actually causes the problem.
Post a Comment for "Feedparser Fails During Script Run, But Can't Reproduce In Interactive Python Console"