Pause Scrape at Specific Points

The following script is only 1 line of code. You may be thinking "Why would this script deserve a place in the repository?" and I'd answer, "I'll show you."

This code is called the breakpoint. When a script is being developed it is common to run it from inside of screen-scraper. In fact, it is a best practice to run scraping sessions often to ensure that you are getting the results you want by checking the log. It is during development that you might want to consider using this script.

First create a new script and label it breakpoint.

Then add this single line of code to it.

 session.breakpoint();

Now, when you want to check which variables are in scope you can include this script to run after a pattern is matched. This will come in very handy when you want to see what is in a dataRecord and what is saved as a session variable.

Then when your testing is done simply disable the script from running by removing the check mark in the enabled box wherever you have placed this script.

Comments

Random pause

What's better than a generic pause? How about a random pause.

The following courtesy of Jason Bellows.

import java.util.Random;

// Pauses scraping session each time script is run.
// Random interval of 4 to 12 seconds
Random generator = new Random();
seconds = generator.nextInt(8) + 4;
milli = seconds * 1000;
session.log("+++Pausing for " + seconds + " seconds");

sutil.pause(milli);

Breakpoint Augmentation: Trace

What we now need is another feature that works very much like the breakpoint, but does not stop code processing... just a popup trace window that shows progress on, say, session variables.

Maybe such a thing already exists? The log window is too "busy" to really see much progress on the actual success/failure indicators in a scrape.

I like your suggestion,

I like your suggestion, Coastal Data. I, too, find the log hard to follow at times. I'll add your suggestion to our internal features request. Perhaps you could customize the trace window to only track certain session variables you're interested in? As a possible short-term compromise, would it be helpful to have the breakpoint window include the option to /not/ stop the scrape? Instead, each time the breakpoint is called, the window simply gets updated?

One suggestion I have when naming your breakpoint script is to call it, "00--breakpoint". The double-zeros keep it at the top of the list for ready access.

-Scott