Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. The general-purpose programming language Python has a number of libraries that work together to make this process relatively painless. We’ll talk about the process involved in web-scraping, some of the things to keep in mind, and how to use these tools in concert to get the data you need in a format you need. Some knowledge of Python would be helpful.
Instructor: Eric Rochester
Co-sponsored with the Scholars' Lab!