Skip to content Skip to sidebar Skip to footer

Typeerror: Expected String Or Buffer While Using Regular Expression In Python

I wrote this code to remove the tags that match like this

See also:(.*)(.*)

CODE: import mechanize import

Solution 1:

You get that error because the type of the variable i is <class 'bs4.element.Tag'>, and match needs a buffer or string. Secondly, if the pattern doesn't match, then the .match call will return None, so your .group will be a null pointer exception.

Here's a quick and dirty "solution" I don't recommend:

m = re.match("<p><b>See also:</b><ahref=\"(.*?)\">(.*)</a>(.*)</p>", str(i))
if not m:
    print i

A better solution would be to rewrite without trying to parse HTML yourself, letting BeautifulSoup do its job. For example, instead of your regex pattern, exclude the items that contain the text See also and an anchor tag:

ifi.find(text='See also:') andi.find('a'):
    continueprinti

Solution 2:

.find_all(['h2', 'p']) returns Tag objects but re.match expects a string. Don't call re.match on Tag directly. BeautifulSoup allows you to pass regexes to .find*() methods.

Post a Comment for "Typeerror: Expected String Or Buffer While Using Regular Expression In Python"