Will screen-scraper notify me if the site I'm scraping changes?

Once you've set up screen-scraper to extract data from a web site there's a good chance the web site will change at some point. Oftentimes cosmetic changes such as the addition of a font tag or changing text from bold to italic won't affect anything, but if the site makes more dramatic changes, such as altering their navigation system, then your scraping session will break. This generally results causes screen-scraper to either fail to extract records from the site entirely, or scrape significantly fewer records than it had previously. It also usually means that you'll need to update your scraping session to account for the changes in the web site.

There are two approaches we generally take to addressing this issue. The first (and best) approach is to track the number of records screen-scraper extracts each time the scraping session is run. Let's suppose you're extracting records from a site that, on average, will yield about 100 records. If you run the scrape one day and it suddenly only extracts 10 records then something has likely changed with the site, so you'll probably need to adjust your scraping session to account for it. The second approach is to have a special extractor pattern or two that checks for a specific piece of text that you know should be present every time you scrape. This approach is most useful in cases where a site doesn't yield a consistent number of records. If your special extractor pattern doesn't match the text it's looking for then something has likely changed on the site.

Along with all of this you'll likely want some kind of notification system so that you can be made aware when the site changes. To do this you might consider something like screen-scraper's sendMail function. Even better would be to set up an external application that monitors the number of records scraped each time, then logs an error in a database or log file if something comes up.