Looping a Scraping Session
I am trying to loop my scraping session 24/7. I looping through the same scrapeable file over and over again, until I get this message.
ABC123: ERROR--halting the scraping session because the maximum number of scripts allowed on the stack was reached. Current number on the stack is: 50.
I tried to setup this code to run, "After the scraping session ends" but it doesn't kickoff the scraping session again.
// This particular example will only work with professional and enterprise editions of screen-scraper
// RunnableScrapingSession is reserved for these editions
// Import the RunnableScrapingSession class.
import com.screenscraper.scraper.*;
// Tell the scraping session to scrape.
runnableScrapingSession.scrape("ABC123")
I don't want to increase the number of scripts that run higher than 50 to some crazy number cause I think that will just chew up memory since I am running my scrape 24/7 and feeding it into a database.
I tried setting all this
I tried setting all this stuff on my laptop and I am getting this
Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; RemoteScrapingSession has a deprecated constructor in C:\Apache24\htdocs\remote_scraping_session.php on line 7
Notice: Undefined variable: scrapeName in C:\Apache24\htdocs\scrape.php on line 36
An error occurred: Scraping session was either invalid or has not been set.
I have one like this I
I have one like this I use:
{
ss = new com.screenscraper.scraper.RunnableScrapingSession(session.getName());
ss.setDoLazyScrape(true);
ss.scrape();
}
The lazyScrape means the new is not connected to this scrape--it will start in it's own thread.
I put this code to "Always
I put this code to "Always Run at the end". I saw in the log that it said it was kicked off but didn't see that my scraped text file was not updated so didn't this it was kicked off.
ERROR--ABC123: An error
ERROR--ABC123: An error occurred while processing the script: Kickoff
ABC123: The error message was: class bsh.EvalError (line 1): session .shouldStopScraping --Sourced file: inline evaluation of: ``if (!session.shouldStopScraping) { ss = new com.screenscraper.scraper.Ru . . . '' : Cannot access field: shouldStopScraping, on object: ABC123
I am getting this error now.
I made a little edit to the
I made a little edit to the script above. Try it with the change.
Thanks Jason I really
Thanks Jason I really appreciate your help. I am work in IT but not a developer. I just started a python course last night. What did you change? I got the code running right now.
I tried it again but it
I tried it again but it didn't work. After 50 scrapes the session errored out and did not restart. I checked the CSV file that I building and has not been updated in 42 minutes just tp make sure. I have that script running after "Always at the end"
So long at setDoLazyScrape is
So long at setDoLazyScrape is true, you shouldn't get that.
When it's true, a scrape will spawn the new scrape and finish up. When it's not specifically set true, it will stay open while the child scrape runs, and in this case, each child will stay open for its own child, so when you have 50, you'll hit the max open on the stack.
I have set setDoLazyScrape to
I have set setDoLazyScrape to true. I am running this from my laptop in workstation mode. I am not running it in server mode. I turned the logging level to down just to catch errors. Now I know none of the logging will appear after the first time I run it since I will be running it in a different thread every time it restarts. I am just not seeing any updates to my CVS file.
The workbench is mostly just
The workbench is mostly just for developing scrapes. Could you run it in server or batch mode, and then you could have some logs to send.
I can try that..in the mean
I can try that..in the mean time, I got an error when the scrape tried to kick off again.
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
ERROR--Binance: An error occurred while processing the script: Kickoff
Binance: The error message was: class bsh.EvalError (line 9): session .shouldStopScraping --Sourced file: inline evaluation of: ``/* if (session.shouldStopScraping()) { ss = new com.screenscraper.scrape . . . '' : Cannot access field: shouldStopScraping, on object: ABC123
Could you post your script?
Could you post your script?
/* if
/*
if (session.shouldStopScraping())
{
ss = new com.screenscraper.scraper.RunnableScrapingSession("abc123");
ss.setDoLazyScrape(true);
ss.scrape();
}
*/
if (!session.shouldStopScraping)
{
ss = new com.screenscraper.scraper.RunnableScrapingSession(session.getName());
ss.setDoLazyScrape(true);
ss.scrape();
}
/*
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "abc123" );
// Turn off LazyScrape
runnableScrapingSession.setDoLazyScrape( true );
// Tells the session to start scraping.
runnableScrapingSession.scrape();
// Script halts execution until the scrape is finished
for( int i = 0; i < 50; i++ )
{
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "abc123" );
runnableScrapingSession.scrape();
}
*/
Error in your "if" condition
Error in your "if" condition missing parentheses.
I made the change. I am going
I made the change. I am going to try to put the scraper in server mode. If a I use server mode how can I kick off my scrape? Do I have to use an external script to kick off the scrape since I am not supposed to me in the workbench while in server mode?
You do need a script. Most of
You do need a script. Most of the time Linux comes with PHP, so you could use this one. I run it from the command line.
See the tutorial
$options = getopt("n:t:v:");
// Scrape name
if (isset($options['n']) && $options['n']!="")
{
$scrapeName = $options['n'];
echo "\nStarting scrape: $scrapeName";
}
// Number of threads
$threads = 1;
if (isset($options['t']))
{
$threads = $options['t'];
if (is_numeric($threads))
{
$threads = $options['t'];
echo "\nMultithread set: $threads";
}
}
// Variables to set
if (isset($options['v']) && $options['v']!="")
{
$vars = explode('|', $options['v']);
print_r($vars);
}
require('misc/php/remote_scraping_session.php');
// Start scrape
for ($i=0; $i<$threads; $i++)
{
$session = new RemoteScrapingSession;
$session->initialize($scrapeName, "127.0.0.1", 8778);
if (isset($vars))
{
echo "\n===Setting vars===";
foreach ($vars as $var)
{
$blk = explode("=", $var);
$n = stripslashes($blk[0]);
$v = stripslashes($blk[1]);
echo "\n $n :: $v";
$session->setVariable($n, $v);
}
}
$session->setDoLazyScrape(true);
$session->scrape();
// Check for errors.
if($session->isError())
{
echo "An error occurred: " . $session->getErrorMessage();
exit();
}
// Pause
if ($i<$threads-1)
{
sleep(2);
}
}
?>
I tried putting this together
I tried putting this together and now I am getting this error
Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; RemoteScrapingSession has a deprecated constructor in C:\Apache24\htdocs\remote_scraping_session.php on line 7
Notice: Undefined variable: scrapeName in C:\Apache24\htdocs\scrape.php on line 36
An error occurred: Scraping session was either invalid or has not been set.
How did you run it? You don't
How did you run it? You don't need to run it in htdocs. It's set to run from command line.
You need to make sure the PHP is set to connect to the correct serverPort (if you've changed it), and call the remote_scraping_session.php. I generally leave it in the screen-scraper dir, and run it like:
If you need to start and set a value
Here is what I got
Here is what I got tonight.
C:\Program Files\screen-scraper Professional Edition>php scrape.php -n "ABC123"
Starting scrape: BinancePHP Warning: require(remote_scraping_session.php): failed to open stream: No such file or directory in C:\Program Files\screen-scraper Professional Edition\scrape.php on line 30
Warning: require(remote_scraping_session.php): failed to open stream: No such file or directory in C:\Program Files\screen-scraper Professional Edition\scrape.php on line 30
PHP Fatal error: require(): Failed opening required 'remote_scraping_session.php' (include_path='.;C:\php\pear') in C:\Program Files\screen-scraper Professional Edition\scrape.php on line 30
Fatal error: require(): Failed opening required 'remote_scraping_session.php' (include_path='.;C:\php\pear') in C:\Program Files\screen-scraper Professional Edition\scrape.php on line 30
On general_run.php, line 30
On general_run.php, line 30 is a require that needs to point to a file that came with screen-scraper. It comes in screen-scraper/misc/php, and you need to edit the general_run.php to reference it correctly.
Now I am getting
Now I am getting this
C:\Program Files\screen-scraper Professional Edition>php scrape.php -n "ABC123"
Starting scrape: ABC123PHP Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; RemoteScrapingSession has a deprecated constructor in C:\Program Files\screen-scraper Professional Edition\misc\php\remote_scraping_session.php on line 7
Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; RemoteScrapingSession has a deprecated constructor in C:\Program Files\screen-scraper Professional Edition\misc\php\remote_scraping_session.php on line 7
What version of PHP do you
What version of PHP do you have?
Those look like warnings. Is your screen-scraper running? Are you getting anything in the log subdirectory?
7.2.3 I got this working
7.2.3
I got this working tonight! Thanks so much for your help!
Didn't loop
I spoke to soon. ABC123 is not getting kicked off again.
ABC123: ERROR--halting the scraping session because the maximum number of scripts allowed on the stack was reached. Current number on the stack is: 1200.
Stopping the scraping session.
Processing scripts always to be run at the end.
Processing script: "Kickoff"
Scraping session "Binance" finished.
Why does the log say "ABC123"
Why does the log say "ABC123" and the beginning, and "Binance" at the end?
The fact you have 1200 scripts on stack tells me that you're not setting the do lazy scrape, or that you're not running the same script.
I attached a session to this thread. It is named "rerun test". It's set to run repeatedly for 5 minutes, but it could be 5 hours, and at the end of if, the log will say:
Running scraping session: Rerun test
Processing scripts before scraping session begins.
Processing script: "Rerun - init"
>>>Still running
Processing scripts after scraping session has ended.
Processing script: "Rerun"
++++There is 1 scripts on stack
Processing scripts always to be run at the end.
Scraping session "Rerun test" finished.
The lazy scrape set to true won't carry anything over from previous runs.
Here is the session. Paste in a text file, save is as "test run.sss" and import it to your screen-scraper.
<scraping-session use-strict-mode="true" use-only-sslv3="false"><script-instances><script-instances when-to-run="10" sequence="1" enabled="true"><script><script-text>import java.sql.Timestamp;
Long tStart = System.currentTimeMillis();
if (session.getv("TIME_TO_END")==null)
{
Long tEnd = tStart + (5 * 60000);
session.setv("TIME_TO_END", tEnd);
}
else
{
if (tStart>session.getv("TIME_TO_END"))
{
log.logWarn(">>>Time elapsed. Stopping.");
session.stopScraping();
}
else
{
log.logInfo(">>>Still running");
}
}
// session.breakpoint();</script-text><name>Rerun - init</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="20" sequence="2" enabled="true"><script><script-text>log.log("++++There is " + session.getNumScriptsOnStack() + " scripts on stack");
if (!session.shouldStopScraping())
{
sutil.pause(10000);
ss = new com.screenscraper.scraper.RunnableScrapingSession(session.getName());
ss.setDoLazyScrape(true);
ss.setVariable("TIME_TO_END", session.getv("TIME_TO_END"));
ss.scrape();
}
else
{
log.log("Ended");
}</script-text><name>Rerun</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>Rerun test</owner-name></script-instances><name>Rerun test</name><notes></notes><cookiePolicy>0</cookiePolicy><HTTPClientType>5</HTTPClientType><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>March 09, 2018 07:55:37</date_exported><character_set>UTF-8</character_set><created_by_version>7.0.10a</created_by_version></scraping-session>