Typeerror: Expected String Or Buffer While Using Regular Expression In Python
I wrote this code to remove the tags that match like this
See also:(.*)(.*)
CODE: import mechanize importSolution 1:
You get that error because the type of the variable i
is <class 'bs4.element.Tag'>
, and match
needs a buffer or string. Secondly, if the pattern doesn't match, then the .match
call will return None
, so your .group
will be a null pointer exception.
Here's a quick and dirty "solution" I don't recommend:
m = re.match("<p><b>See also:</b><ahref=\"(.*?)\">(.*)</a>(.*)</p>", str(i))
if not m:
print i
A better solution would be to rewrite without trying to parse HTML yourself, letting BeautifulSoup do its job. For example, instead of your regex pattern, exclude the items that contain the text See also
and an anchor tag:
ifi.find(text='See also:') andi.find('a'):
continueprinti
Solution 2:
.find_all(['h2', 'p'])
returns Tag
objects but re.match
expects a string. Don't call re.match
on Tag
directly. BeautifulSoup
allows you to pass regexes to .find*()
methods.
Post a Comment for "Typeerror: Expected String Or Buffer While Using Regular Expression In Python"