Pausing in VBScript
Hi All,
I've been having real issues getting a random pause (between 4 and 12 seconds) in place with Screen Scraper Pro. I decided to start with a fixed length of time (12 seconds) first, but have fallen at the first post!
Here's the code:
'Note: Set 'Next Page' Extractor Pattern to 'After Each Pattern Application' to avoid loops
If session.getVariable( "PAGE" ) <= 91 Then
Call session.pause( 12000 )
' WScript.Sleep ( 12000 )
Call session.ScrapeFile( "Search Results" )
Else
Call session.setVariable ( "PAGE",NULL )
End If
My notes:
- I've tried all the 'Sleep' variations for VBScript I've found online: WScript.Sleep (12000) , WScript.Sleep 12000 etc, upper & lower cased.
- I know I've commented out this line in the above code.
- I know that the Pro version has a sleep function as part of the API, but it's in Interpreted Java: if anyone can replicate the whole of the script to make this work then great, but I do ultimately need a random time, so this would no doubt require a little trickery.
In essence what I'm trying to do is read a keyword, then grab the search result page, strip all the link data then call the next page after a random pause with a max of 10 pages followed. Everythign except the random pause is working beautifully. Can anyone help?
10000
Ignore that
I've found a solution by using the API interpreted Java paus in a different part of the scrape.
I'm sure the issue is with my interpretation of VBScript rather than a program issue. All is well now!
Hmm.. I'm not too familiar
Hmm.. I'm not too familiar with VBscript (in fact, it pwns me every time I try to code in it)
Our API is supposed to be useable in all the languages we support, so I'd imagine that you'd be able to call the same "session.pause(123412341234)" line in VBscript. If that's not true, then I'm not sure what else to do other than make a seperate script out of it, like you've said you have.
If you still want to make use of the script, but in Java instead, here's what I'd write:
// Interpretted Java
// Note: Set 'Next Page' Extractor Pattern to 'After Each Pattern Application' to avoid loops
if (session.getVariable("PAGE") <= 91)
{
session.pause(12000)
session.scrapeFile("Search Results")
}
else
session.setVariable ("PAGE", null)
As you can see, it's pretty similar. Differences are just a couple of things...
Anyway, you should be able to just copy and paste that into a script and mark it as "Interpreted Java" and you'll be good to go.
Moving to Java
Hi Tim, thanks for the advice. As i'm planning on running multiple sessions, I'll need to move away from VbScript at some point, so I've tried your solution and I have two problems I'm hoping you can help with:
Using your code above, I get the following error: "Encountered "session" at line 5, column 5"
As I've used your code, without the first commented line, that makes the source of the error the line:
session.scrapeFile("Search Results")
I'm new to Java so troubleshooting's pretty tricky, but I'm wondering if the lack of ; at the end of lines or the lack of "" marks around the previous pause value is the problem?
The second issue is once I have the Interpreted Java solution in place, I'd need to extend it by having a random pause between 4 and 12 seconds long - if you have a solution to achieve that as well I'd be really appreciative.
Thanks for the help so far!
10000
Too much python for me
haha.. so terribly sorry. I've been programming in Python recently, which doesn't require semicolons.
Yes, change the script to add semi-colons:
// Interpretted Java
// Note: Set 'Next Page' Extractor Pattern to 'After Each Pattern Application' to avoid loops
if (session.getVariable("PAGE") <= 91)
{
session.pause(12000);
session.scrapeFile("Search Results");
}
else
session.setVariable ("PAGE", null);
Don't put quotes around the number in the call to session.pause--- it's expecting an integer, not a string.
One fundamental thing to understand about Java is that there are "primitive" data types, such as "int", "boolean", "float", and one or two others maybe. Other than that, all of the data types in Java are of the general type "Object". There are Object types that are a little redundant, such as an Object-type "Integer", which is not the same as a primitive "int".
Point being, "methods" in Java ("functions" of c/++ and other languages) need to recieve the data type that they want. If you ever have numbers that you need to turn into strings, the ".toString()" method is invaluable. For example, you can do the following on an integer value:
String aStringVariable = someIntegerVariable.toString();
Alternately, you can put an actual evaluation in parintheses and then use the same method:
String aStringVariable = (someIntVariable * 2).toString();
The secret to Java is that all "Objects" of the language (including String, Array, ArrayList, HashMap, HashTable, Integer, etc, etc, etc) have "methods" associated with them, so that when you have a variable of type "String", you can use that dot-methodname syntax to call a method from the String class.
I constantly make use of the Java API online. I google-search the object type with the word "java" in there somewhere and I always get a result for the java api covering that class, which lists all the methods you can call on a variable of that type.
</babbling>
To address your question about generating the random number, here's what you have to do, and why:
import the "Random" package that comes with Java, so that you can get a random number generator.
make an instance of the Object that comes in the Random package.
Tell it to grab you and Integer (the result will be of class-type "Integer", not primitive "int") in the range you specify
So, your final code would look like this:
// Interpretted Java
import java.util.Random; // Gives you access to the "Random" package
// This line is following the general pattern for instantiating a new class object in Java:
// Classname [the name you want to give it] = new ConstructorName
// The classname is almost always the name of the package.
// The "new" keyword is always there. It's just a thing that Java requires. (Personally, I think it's silly :P )
// The contructor name is almost always the name of the class and the package name. Be aware that some constructors that you call right here can also take parameters. To know if you want to pass it a parameter or not, you should check Java's API on the class.
Random generator = new Random();
// Get an Integer out of there. It'll convert automatically into a primitive "int" since my variable is an "int". This conversion isn't critical to running properly, but I prefer "int" variables if I can. It's shorter to write out than "Integer pauseDuration = new Integer(generator.nextInt(8) + 4);
// Note that ".nextInt()" is a method of the "Random" class object, which takes zero or one parameter, depending on what you want it to do.
int pauseDuration = generator.nextInt(8) + 4;
if (session.getVariable("PAGE") <= 91)
{
session.pause(pauseDuration);
session.scrapeFile("Search Results");
}
else
session.setVariable ("PAGE", null);
so close
Tim this is really excellent support - thanks very much!
I changed the values used in the random number generator to 12000 and 4000 as the session.pause() function accepts input in milliseconds.
I think the script is at core functioning, but there is one critical issue left: the less than or equal to operator now throws an error:
The error message was: Operator: '"<="' inappropriate for objects : at Line: 15.
Which suggests that I need to look at a different way of doing the comparison or converting both values to be of the same type. As a note, the value 'PAGE' is in increments of 10.
Any idea how to jump this last hurdle?
I have to admit, I'm not sure
I have to admit, I'm not sure why you're getting that last error. The only "<=" is on the "session.getVariable("PAGE") <= 91" part.
... hmm...
the session variable "PAGE" must not be a proper "int" or something. Where are you setting the PAGE variable? Is it in a script, or from an extractor pattern?
Even given my explanation of the differences between a primitive "int" and the "Integer" Object, the comparison should still be okay.
If the "PAGE" variable is a String Object, then obviously it's hard for Java to determine if a String is less than an int/Integer. If you can verify that the "PAGE" variable is a String (as in, it was set with a call such as 'session.SetVariable("PAGE", "2")'), then change that if statement to read as follows:
if (Integer.parseInt(session.getVariable("PAGE")) <= 91)
and then you should be good. Brief explination of the above line:
As I explained before, "Integer" is an Object type; it's part of the "Integer" class-type of objects. In Java, you can have "methods" which you use on a variable that you have, such as that ".toString()" method I talked about. "toString" is a standard method that any Integer can have called on itself.
On the other hand, there is another classification of methods, called "static" methods. The terminology comes from C/++ , and is probably used in other languages as well (not sure about VBscript, so I'm explaining it :D ). A "static" method means that it is called all by itself, and isn't called FROM a variable. Static methods are for general tasks. They won't alter their parameter variable by default. You can often just think of static methods as utilities. Normal methods have a common input type (all the normal methods of the String class require a String as the variable from which you are calling them), whereas Static methods have a common output type. Now, hold the thought for just a second.
For instance, calling the normal method "toString", you have to call it ON a variable: myIntegerVariable.toString() . In the example "myIntegerVariable" is an int/Integer, and the Integer class has "toString()" built into it. It's purpose is to turn the integer into a String. There are probably methods like "toFloat" and such, too. The point of a normal method is that you're always taking in the same data type, but the resulting datatype may change. In this paragraph's example, the data returned is going to be a String.
On the flip side, a static method is called all by itself, and you pass it a variable to work with: staticMethodName( passingThisVariable ) . In this case, the OUTPUT will always be an int/Integer, whereas normal methods always had an int/Integer as INPUT. Static methods take varied kinds of INPUT.
Usually with static methods, you always specify the Class that the static method is a part of, so that Java knows where to look for said static method. Hence, my new IF statement is going into the Integer class, getting the "parseInt" static method, and then we're passing it a String, and getting an int/Integer out.
myIntVariable = Integer.parseInt( someString )
Quick summary with the "Integer" class example:
Normal Integer methods are called on an int/Integer variable.
in: int/Integer
out: depends on the method
Static Integer methods are call all by themselves, without a base variable, but instead pass a variable into the static method.
in: depends on the static method
out: int/Integer
This is the first time that I've thought about this type of explination; if more is needed, by all means, let me know.
And if it turns out that "PAGE" isn't a String.... then.... :) We'll figure it out.
Tim
Success
Persistence pays off!
All seems to be working fine, the final code is:
// Interpreted Java
import java.util.Random; // Gives you access to the "Random" package
// This line is following the general pattern for instantiating a new class object in Java:
// Classname [the name you want to give it] = new ConstructorName
// The classname is almost always the name of the package.
// The "new" keyword is always there. It's just a thing that Java requires.
// The contructor name is almost always the name of the class and the package name. Be aware that some constructors that you call right here can also take parameters. To know if you want to pass it a parameter or not, you should check Java's API on the class.
Random generator = new Random();
// Get an Integer out of there. It'll convert automatically into a primitive "int" since my variable is an "int". This conversion isn't critical to running properly, but I prefer "int" variables if I can. It's shorter to write out than "Integer pauseDuration = new Integer(generator.nextInt(8) + 4);
// Note that ".nextInt()" is a method of the "Random" class object, which takes zero or one parameter, depending on what you want it to do.
int pauseDuration = generator.nextInt(8000) + 4000;
if (Integer.parseInt(session.getVariable("PAGE")) <= 91)
{
session.pause(pauseDuration);
session.scrapeFile("Search Results");
}
else
session.setVariable ("PAGE", null);
Thanks for the support Tim!
10000