session.isRunning() problem
I am trying to use the session.isRunning() method, but can not get it to work. When i terminate a session in the workbench, and later prints aut this:
s = "" + session.isRunning();
session.logError(s);
The program will write true, even though i did terminate the session, and it should be false.
Is this a bug, or an error from my side?
Hans
Tim is there any way to
Tim is there any way to reliably force a session to stop from inside the workbench?
If you happen to get stuck in a while loop it seems the only way around it is to restart the workbench. I've tried using isRunning for this as well but it doesn't work.
Yeah, I know what you mean
Yeah, I know what you mean about the while loop problem. That's a flaw of Beanshell :)
I've set up something to stop that, before, but I can't remember exactly how it went... it had to do with always setting your 'while' condition to false, and then using an 'if' statement to turn it back to true all the time. that way, beanshell would naturally always try to exit the loop, but the if statement prevents it. When the 'stop' button is pressed, beanshell stops executing lines in the while statement, the condition is false by default, and exits cleanly, because it is ignoring the 'if' statement.
I dunno.. I think it was specific to my situation.
You could always develop with the last line of your 'while' statement being a 'break;' command :) Not very professional, but for debugging/development, it's great. Alternatively, (and this was my most slick solution), I've put a manual counter into the mix, and then done this at the end of the loop:
while condition
{
// manipulate condition variable, blah blah
if (condition)
continue;
break;
}
That way, the while loop always breaks if you've hit the stop-scrape button, but when the scrape is running for real, it'll encounter that 'continue;' statement, thus skipping the 'break'.
*shrug*
At least that's the way I remember it.
Hi, If you're going through
Hi,
If you're going through the whole process in the workbench, then I believe that it probably won't report "false" until it finishes trying to get out of the scripts it was currently in. It may very well be impossible to use the workbench as a test environment to see your desired boolean value. This is more designed to be performed as part of the server/remoteSession API. For instance, if you were trying to use external PHP/Java/etc to monitor a scrape that you set going through the server mode, then this method could be used to find out if the scraping session in question has stopped for some reason.
In short, the flag being returned to you probably doesn't flip to 'false' until after the session can get out anything that it is currently doing, including your test script.
Hope that helps-- Is there some larger scheme that you are trying to accomplish?
Tim
Re
Hi
What I am trying to do is to run through a website with some products. This will be done on a daily basis. When the session is done, all new products will be saved to a database, and all products that are no longer present will be disabled in the database. If the session for some reason stops problems with the connection for instance, the script that saves the data will be run anyway. The products that were not scrapped because of the error will now be disabled in the database. I am trying to check if the session is still running, before i save the data.
Is there another way to accomplish this?
Hans
My first thought is to have
My first thought is to have some point in the scrape, such as a details page, where you set the value of a true/false flag variable. If you don't receive what you expect to see on the details page, you'll want to change the flag to false. Then, all your various scripts that iterate pages, go to details pages, etc, -- those scripts can be guarded by something like this:
// Interpreted Java
if (session.getVariable("NO_ERRORS"))
{
// call a details page, or whatever your scripts do
}
This way, as soon as the "NO_ERRORS" variable has a 'bad' value in it (which you detected yourself), nothing further is allowed to happen in the scrape, and it will fizzle out all by itself, having run out of things it's allowed to do.
There might be some interesting intricacies in your own scrape, but I've used this approach before, and it seems to work out alright, despite the fact that it requires an alteration to several scripts.
The nice part about it is that you can make a single "Check if errors" script which tests for the error condition (bad connection, a certain exception, etc) and sets the flag variable accordingly. With a script like that, you can call it anywhere you like, as many times as might be needed.
Tim
How to check for errors?
Thanks for the replies.
I think i understand what to do. I need to make a check for every iteration. The problem is that I do not scrape a detailed page, but a list with links. For every page I scrape all the links. When I am done, I will load all the old links from the database. Create all the new ones, and disable all links that are no longer present. If an error occurs, links that are still present will be deactivated.
What do you usually check for? In the SS web interface, I can see that there is some function that is able to check if there are big differences from the last to the current scraping session. Is it possible to access this function within a script?
One way would be to create all the links, and then check all of them and see what status code the server returns. That way I am sure, that none of the active links will be deactivated, but it is a time consuming task and it creates a lot of traffic on the pages that i scrape, which I am not interested in.
Hans