xLibris Search Results Scraper
Webscraping is a solved problem. Five years ago, to scrape a web page required substantial experience and technical know-how to scrape pages that are well suited for scraping. Now-a-days, if you can see it on a website, you can scrape it.
Today's project focused on a system that generates search restults on the backend and then presents them through a javascript-heavy front end. In discussions with colleagues, the idea of systematically getting these results was out of the questions. I objected.
Using just a single url, I was able to identify the underlying API and generate a script to get the full results from the API. Not Earth shattering, but pretty fantastic for a 30min project! And as always, be polite when scraping webpages...
See the scraper on Github