Web scraping redoc web api

Redoc is a React app which means the actual HTML is being built in runtime:

  • first the skeleton of the page loads, which also loads redoc javascript
  • then the Redoc downloads the OpenAPI json (or yaml) file and renders the actual HTML dynamically based on it

This is similar for many apps build with modern JS frameworks (vuejs, react, angular). To scrape these you have to actually load the page in a browser to run all the javascript.

I believe the most common way to do it nowadays is to use puppeteer (there is a python binding: https://github.com/pyppeteer/pyppeteer/)

Leave a Comment