/me wants it: Scraping Sites to get Data


Abstract


Life would be so much easier if the data contained in websites was available raw via APIs. Alas, until that mythical day comes we either need to deal with unhelpful people via email and phone, or just get it ourselves.

Python has some great tools available to help with building scrapers and for parsing and formatting the data we get. Starting off with the basics - tracking what needs to be done, making web requests, parsing HTML, following links, and extricating data from Excel and PDF documents. Then linking to a number of advanced tools for dealing with logins, forms, and other complexities.


Full Text:

PDF ZIP