WordPress WXR Specification

I posed this question to John O’Nolan, a WordPress core developer, who forwarded my question to core committer Aaron Jorbin. O’Nolan responded by saying (I’m paraphrasing) that there isn’t any official documentation on the WXR standard, but that reverse engineering a WXR export should give me all the information I need. O’Nolan also noted that … Read more

What is ?

XML Declaration <?xml version=”1.0″?> is an XML declaration. It is an optional indication of the version of XML, the character encoding, and the standalone document declaration. It can only appear as the very top of an XML file, if anywhere, and may not be repeated. <xml version=”1.0″> is an open tag (that will require a closing tag) to an … Read more

How to fix error: The markup in the document following the root element must be well-formed

General case The markup in the document following the root element must be well-formed. This error indicates that your XML has markup following the root element. In order to be well-formed, XML must have exactly one root element, and there can be no further markup following the single root element. One root element example (GOOD) The most common sources … Read more

How to read and write XML files?

Here is a quick DOM example that shows how to read and write a simple xml file with its dtd: and the dtd: First import these: Here are a few variables you will need: Here is a reader (String xml is the name of your xml file): And here a writer: getTextValue is here: Add … Read more

Parsing HTML using Python

So that I can ask it to get me the content/text in the div tag with class=’container’ contained within the body tag, Or something similar. You don’t need performance descriptions I guess – just read how BeautifulSoup works. Look at its official documentation.

Parsing XML with Ruby

As @pguardiario mentioned, Nokogiri is the de facto XML and HTML parsing library. If you wanted to print out the Id and Name values in your example, here is how you would do it: A few notes: at_xpath is for matching one thing. If you know you have multiple items, you want to use xpath instead. Depending on your document, namespaces can be problematic, … Read more