screen-scraper public support

Using Session Variables

Hi,

I have 2 extractor patterns to pick different data from one scrapeable file. The 1st will pick up a single user name the 2nd will pick up a variable number of entries from a list (min is 1) all the latter complying with the same extractor pattern.

My problem is that I have designated the user name as a session variable and want to write that to file using a script that runs after the second extractor pattern is applied, so that I can end up with

user name - > data 1
user name - > data 2
user name - > data 3

GD on 03/13/2007 at 9:00 pm

screen-scraper public support

Strange data being appended to sesion variable

Hi all

getting a bizarre problem, can anyone help ?

In my script file for a scraping session I have :

objHttp.Send sData
sResponse = objHttp.ResponseText
session.setVariable "data",sResponse
session.log sResponse

looking in the log, I find that sResponse was correct (it's a 116kb chunk of XML).

However, in my ASP I find that retrieving the session variable :

dataFile=dumpToFile(objSS.GetVariable("data"))

results in the characters "___CR______NL___" being appended.

JasonLoCascio on 03/13/2007 at 10:01 am

screen-scraper public support

Calling a JAR from SS

Hi Y'all

I'm trying to use a self created JAR with screen scraper but I'm having little success. The interpreted Java within screen scraper appears to just die without any errors/messages when I try to create a new instance of the class. Any suggestions on how to debug this (none of the logs under SS highlight the cause of this issue). Details below...

The JAR has been placed in the Screenscraper lib/ext directory.

I also tried placing the class file in the fully qualified directory based on the Java package name /lib/ext/marksull/library.

marksull on 03/10/2007 at 11:48 pm

screen-scraper public support

repeated calls via php

Using loop in PHP to hit SS in server mode.
There are around 500 search terms I am looking for in the site.
Wondering if it is better to initialize the scrape session before the loop, or open/scrape/close session for each search term?

Thank you!!!

-Keith

keithmgould on 03/02/2007 at 1:15 pm

screen-scraper public support

1 comment

Whoops. I broke something upgrading

Yep, it said it was Alpha and I understood what I was getting into..

But since upgrading to 3.0.5a I've been having trouble getting the soap service functioning.

I'll be rolling back a version or so, but wanted to let you know I was having this issue, in case it's something you haven't experienced. The only variable (as far as I know) was the update, I haven't changed the code that interfaces with the soap stuff.

fnirt on 02/28/2007 at 10:20 am

screen-scraper public support

Scraping large chunks of html including line breaks

I am having trouble scraping large chunks of html code and preserving the line breaks. The extractor process works as I intend it to, however when I print the newly scraped output to the screen, all line breaks have been removed. Is there a way I can maintain the integrity of the page?

If the html page looks like this:

blah

Scraped results:

blah
blah
blah

Desired results:

foobaz2112 on 02/23/2007 at 2:15 pm

screen-scraper public support

more help translating &

I'm working on a scrape that extracts urls. When I extract these links from a
results page, the & are all in & format, which results in a "page not found" problem when I feed them back to scrape the page behind the link.

I found a link to an htmlparser in the forums, which I am calling in a script after each extraction of a url. Unfortunately, it's not quite clear how to apply the parser. Currently, my code looks like this:

import org.htmlparser.util.Translate;
//get yourStr of text from screen-scraper
'&' = Translate.decode( & );

rs388 on 02/22/2007 at 3:55 pm

screen-scraper public support

Most efficient way for scripts to get dynamic configuration

We've got a TON of scrapes, and we have sample test cases for them all. The information about what goes into a test case is stored in another system.

I can think of a lot of different ways to get this info when a script runs (read flat file from filesystem, read xml from filesystem, talk to a database, hit a webservice, etc.).

But what is the BEST and MOST EFFICIENT method for doing something like this? I'm not familiar enough with java to know that, say, the filesystem is more expensive to hit than a database, etc.

Anyone else doing something like this?

fnirt on 02/21/2007 at 11:25 am

screen-scraper public support

3 comments

Repeated Crashing of workbench...bug?

The following interpreted java script causes repeated crashing of workbench, leaving Java.exe running and thus the 'cant bind to port, db' error. I have to go into task manager and quit java.exe. Is this a bug?
If I comment out the getVariable line, and replace with a String file = "foo.csv", then it works fine. Any ideas? How should I be fetching a session variable if not like this?

Thanks!!

keithmgould on 02/21/2007 at 11:05 am

screen-scraper public support

last match

There is prob an easy way to do the following; any suggestions would be great!

Often, when searching for a pattern, I find too much. I end up taking a datarecord and searching for the same thing again inside results. Is there an easier way? For example I need to check for existance of 'Next', which is at the bottom. If found I want to extract the value '2', which is the next page. If I use the pattern

page=~@STARTFROM@~" class="pagerNotCurrent">Next

Then I get way too much. Is this clear?

keithmgould on 02/20/2007 at 2:29 pm

screen-scraper public support

Search

Community

screen-scraper

User login