Parsing HTML using Python

So that I can ask it to get me the content/text in the div tag with class=’container’ contained within the body tag, Or something similar.

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)

You don’t need performance descriptions I guess – just read how BeautifulSoup works. Look at its official documentation.

Leave a Comment