screen-scraper public support
Using Session Variables
Hi,
I have 2 extractor patterns to pick different data from one scrapeable file. The 1st will pick up a single user name the 2nd will pick up a variable number of entries from a list (min is 1) all the latter complying with the same extractor pattern.
My problem is that I have designated the user name as a session variable and want to write that to file using a script that runs after the second extractor pattern is applied, so that I can end up with
user name - > data 1
user name - > data 2
user name - > data 3
Strange data being appended to sesion variable
Hi all
getting a bizarre problem, can anyone help ?
In my script file for a scraping session I have :
objHttp.Send sData
sResponse = objHttp.ResponseText
session.setVariable "data",sResponse
session.log sResponse
looking in the log, I find that sResponse was correct (it's a 116kb chunk of XML).
However, in my ASP I find that retrieving the session variable :
dataFile=dumpToFile(objSS.GetVariable("data"))
results in the characters "___CR______NL___" being appended.
Calling a JAR from SS
Hi Y'all
I'm trying to use a self created JAR with screen scraper but I'm having little success. The interpreted Java within screen scraper appears to just die without any errors/messages when I try to create a new instance of the class. Any suggestions on how to debug this (none of the logs under SS highlight the cause of this issue). Details below...
The JAR has been placed in the Screenscraper lib/ext directory.
I also tried placing the class file in the fully qualified directory based on the Java package name /lib/ext/marksull/library.
repeated calls via php
Using loop in PHP to hit SS in server mode.
There are around 500 search terms I am looking for in the site.
Wondering if it is better to initialize the scrape session before the loop, or open/scrape/close session for each search term?
Thank you!!!
-Keith
Whoops. I broke something upgrading
Yep, it said it was Alpha and I understood what I was getting into..
But since upgrading to 3.0.5a I've been having trouble getting the soap service functioning.
I'll be rolling back a version or so, but wanted to let you know I was having this issue, in case it's something you haven't experienced. The only variable (as far as I know) was the update, I haven't changed the code that interfaces with the soap stuff.
Scraping large chunks of html including line breaks
I am having trouble scraping large chunks of html code and preserving the line breaks. The extractor process works as I intend it to, however when I print the newly scraped output to the screen, all line breaks have been removed. Is there a way I can maintain the integrity of the page?
If the html page looks like this:
blah
blah
blah
Scraped results:
blah
blah
blah
Desired results:
more help translating &
I'm working on a scrape that extracts urls. When I extract these links from a
results page, the & are all in & format, which results in a "page not found" problem when I feed them back to scrape the page behind the link.
I found a link to an htmlparser in the forums, which I am calling in a script after each extraction of a url. Unfortunately, it's not quite clear how to apply the parser. Currently, my code looks like this:
import org.htmlparser.util.Translate;
//get yourStr of text from screen-scraper
'&' = Translate.decode( & );
Most efficient way for scripts to get dynamic configuration
We've got a TON of scrapes, and we have sample test cases for them all. The information about what goes into a test case is stored in another system.
I can think of a lot of different ways to get this info when a script runs (read flat file from filesystem, read xml from filesystem, talk to a database, hit a webservice, etc.).
But what is the BEST and MOST EFFICIENT method for doing something like this? I'm not familiar enough with java to know that, say, the filesystem is more expensive to hit than a database, etc.
Anyone else doing something like this?
Repeated Crashing of workbench...bug?
The following interpreted java script causes repeated crashing of workbench, leaving Java.exe running and thus the 'cant bind to port, db' error. I have to go into task manager and quit java.exe. Is this a bug?
If I comment out the getVariable line, and replace with a String file = "foo.csv", then it works fine. Any ideas? How should I be fetching a session variable if not like this?
Thanks!!
last match
There is prob an easy way to do the following; any suggestions would be great!
Often, when searching for a pattern, I find too much. I end up taking a datarecord and searching for the same thing again inside results. Is there an easier way? For example I need to check for existance of 'Next', which is at the bottom. If found I want to extract the value '2', which is the next page. If I use the pattern
page=~@STARTFROM@~" class="pagerNotCurrent">Next
Then I get way too much. Is this clear?