Parsing HTML using Python

So that I can ask it to get me the content/text in the div tag with class=’container’ contained within the body tag, Or something similar. You don’t need performance descriptions I guess – just read how BeautifulSoup works. Look at its official documentation.

Which HTML Parser is the best?

Self plug: I have just released a new Java HTML parser: jsoup. I mention it here because I think it will do what you are after. Its party trick is a CSS selector syntax to find elements, e.g.: See the Selector javadoc for more info. This is a new project, so any ideas for improvement are very welcome!

What is parsing?

Parsing usually applies to text – the act of reading text and converting it into a more useful in-memory format, “understanding” what it means to some extent. So for example, an XML parser will take the sequence of characters (or bytes) and convert them into elements, attributes etc. In some cases (particularly compilers) there’s a … Read more