Scheduled Scraping Sessions Halting
I have a group of scrapes that are scheduled to run every morning at the same time, specifically 18 scrapes, 14 of which are produced from the same runnable scrape, each with unique parameters. I have set the maximum concurrent scrapes to 10. At the scheduled start time, ten of the scrape begin - as they should. Several of the scrapes will finish successfully, anywhere between 3-6. However, at different points in time the remaining scrapes halt and do not finish processing.
The scrapes that finish, or do not finish do not appear to be consistent from day to day. ie. a scrape will halt one day, and complete successfully the next. There are also no apparent similarities between the final log lines before the halt occurs. I've reviewed standard Screen-Scraper error log files, and there are no error logs.
The process has 300+ MB of available memory while the scrapes are taking place, and the system itself has ~1.5GB of free memory during processing.
Some extra information on my setup:
System Configuration
- Screen Scraper Enterprise 4.0.21a
- Mac OS 10.4.11
Screen-Scraper Settings
- Connection timeout: 30 seconds
- Data extractor timeout: 20 seconds
- Maximum number of concurrent running scraping sessions: 10
- Maximum memory allocation in megabytes: 512MB
I recently upgraded from an older version of Screen-Scraper Pro, to Screen-Scraper Enterprise 4.0.*. The scrapes were previously managed by a cron job. The issue presented itself with my previous scrape management system, after the upgrade. In an effort to solve it, I've implemented the web based schedule - however, the problem has persisted through this change.
I would greatly appreciate if anyone has any idea why this may be happening. Any suggestions are welcome, as I've exhausted my abilities.
Thanks in advance.
Same Problem
On a PC, Windows XP.
When I run it from the Screen Scraper software the session works fine, but when run from the Scheduler it runs for a page or two of scraping (it varies) then fails.
Have you any idead what is likely to cause inconsistencies between the Scraper and the Scheduler?
As scraper's comment below
As scraper's comment below suggests, does the last line of your log say "Processing script: xxxxxx"?
We've noticed some odd behavior when an occasional scrape runs outside of the workbench. Can you verify if this is the case or not?
If so, you could also try allowing our alpha updates (which, despite being "unstable", usually aren't all that unstable :) ).
Other than that, do you notice any odd behavior on the computer, like Java taking up a lot of memory or processing power? You'd have to run it and watch and wait to see if anything happens, though... (maybe not my idea of fun, but hey).
last line of the logs
Tyler,
You mentioned that the last lines of the log files don't seem to have any similarities. I have noticed behavior similar to this on occasion, but I've noticed that my last line in the log generally starts with "Processing script:". It doesn't matter which script it is processing, but that's always the last line for me. Is it the same with yours?