Course Abstract
Training duration : 2 hours
Learning Objectives
-
How to scrape nearly any website
-
How to automate some browser tasks (like clicking and scrolling)
-
How to schedule and repeat scraping jobs
Instructor(s)
Instructor Bio:
Distinguished Faculty Member | General Assembly
Max Humber
Course Outline
Introduction ( 5 | 5 minutes )
Who am I, and who are you?
HTML/CSS Basics
Learning Agenda
Basic Web Scraping ( 15 | 20 minutes )
A quick review on how to fetch HTML and quickly parse it
How target HTML element tags and attributes
Exercise: Scrape a “simple” website
HTML Parsing ( 15 | 35 minutes )
String manipulation techniques and list comprehensions for scraping
Looping, sleeping, and monitoring
Working with HTML tables
Exercise: Scrape a Wikipedia table
Scraping JavaScript ( 15 | 50 minutes )
How to scrape data locked behind a login page
How to scrape data rendered with JavaScript
Exercise: Bypass a login page with credentials
Browser Automation ( 20 | 70 minutes )
Replicate scrolling and browser clicks to get at hard to parse data
How to scrape and download images
How to scrape and download video and audio
Scheduling ( 20 | 90 minutes )
How to put a scraper on a schedule
How to send results to a Slack channel
Exercise: Schedule a scraper locally
Serverless ( 20 | 110 minutes )
How to schedule scrapers with AWS Lambda
How to save results to a database
Exercise: Use AWS to scrape a web site
Conclusion ( 5 | 115 minutes )
Background knowledge
-
Required: Experience with Python
-
Nice-to-have some familiarity with BeautifulSoup