Scrape Statistics

We were just having a technical discussion and someone pointed out that we broke 10,000 captcha's on Monday. Figuring it was a gross error in calculations we started talking about our traffic and realized that number was likely accurate. [To be honest, it was an easy captcha. :) ]

But it begged the greater question: Is there any kind of BUILT IN internal or exposed "counter" that will let you know statistics about how much work screen-scraper has done? Like scrapes executed since service start (or EVER?), average speeds, stuff like that? Even just a simple count would be interesting.

How do you understand your own numbers, without relying on external sources (which may be counting poorly, wrong, incorrectly, etc.)?

fnirt, Interesting questions.

fnirt,

Interesting questions. There is no mechanism built into screen-scraper to track such things. The stats that closest match what your asking can be found in your resources/conf/screen-scraper.properties file.

Open up that file in your favorite text editor and you'll see these:

CommandLine.NumTimesRun=
Server.NumTimesRun=
Workbench.NumTimesRun=

Otherwise, in order to track such things as the number of scrapes run, average speeds, etc. would require you to add counter code in scripts that get fired at strategic moments during your scrape.

By making use of session variables that contain timestamps you could quite accurately monitor the time it takes to perform certain tasks within your scrape.

Only on rare occasions when we need to track down performance issues in more elaborate scrapes we will hook them up to JProfiler which lets us observe where the memory is being used within screen-scraper.

Please post any ideas you come up with to track the goings-on inside your scrapes for others out there who love to munch on stats.

-Scott