Skip to content Skip to sidebar Skip to footer

Python Beautifulsoup4 Website Parsing

I'm trying to scrape some sports data from a website using Beautifulsoup4, but am having some trouble figuring out how to proceed. I'm not that great with HTML, and can't seem to

Solution 1:

Each score unit is located inside a <td class='match-details'> element, loop over those to extract match details.

From there, you can extract the text from children elements using the .stripped_strings generator; just pass it to ''.join() to get all strings contained in a tag. Pick team-home, score and team-away separately for ease of parsing:

formatchin soup.find_all('td', class_='match-details'):
    home_tag = match.find('span', class_='team-home')
    home = home_tag and''.join(home_tag.stripped_strings)
    score_tag = match.find('span', class_='score')
    score = score_tag and''.join(score_tag.stripped_strings)
    away_tag = match.find('span', class_='team-away')
    away = away_tag and''.join(away_tag.stripped_strings)

With an additional print this gives:

>>>for match in soup.find_all('td', class_='match-details'):...    home_tag = match.find('span', class_='team-home')...    home = home_tag and''.join(home_tag.stripped_strings)...    score_tag = match.find('span', class_='score')...    score = score_tag and''.join(score_tag.stripped_strings)...    away_tag = match.find('span', class_='team-away')...    away = away_tag and''.join(away_tag.stripped_strings)...if home and score and away:...print home, score, away... 
Newcastle 0-3 Sunderland
West Ham 2-0 Swansea
Cardiff 2-1 Norwich
Everton 2-1 Aston Villa
Fulham 0-3 Southampton
Hull 1-1 Tottenham
Stoke 2-1 Man Utd
Aston Villa 4-3 West Brom
Chelsea 0-0 West Ham
Sunderland 1-0 Stoke
Tottenham 1-5 Man City
Man Utd 2-0 Cardiff
# etc. etc. etc.

Solution 2:

You can use tag.string propery to get value of tag.

Refer to the documentation for more details. http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Solution 3:

due to a redirect to here: https://www.bbc.com/sport/football/premier-league/scores-fixtures

This is an update to the accepted answer, which is still correct. ping me if you edit your answer and i will delete this answer.

formatchin soup.find_all('article', class_='sp-c-fixture'):
    home_tag = match.find('span', class_='sp-c-fixture__team sp-c-fixture__team--time sp-c-fixture__team--time-home').find('span').find('span')
    home = home_tag and''.join(home_tag.stripped_strings)
    score_tag = match.find('span', class_='sp-c-fixture__number sp-c-fixture__number--time')
    score = score_tag and''.join(score_tag.stripped_strings)
    away_tag = match.find('span', class_='sp-c-fixture__team sp-c-fixture__team--time sp-c-fixture__team--time-away').find('span').find('span')
    away = away_tag and''.join(away_tag.stripped_strings)
    if home and score and away:
        print(home, score, away)

Post a Comment for "Python Beautifulsoup4 Website Parsing"