screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Errors: conditional operators in interpreted java

Hello again, I've been running into what may be a simple and stupid error on my part in my most recent scraping.

I have 3 different session variables: low, high, and max, all of which need to be passed in the URL for my scrapings. I am attempting to write a portion of my script so that when my 'high' var exceeds the 'max' var (I increase low and high by 50 every time I scrape the page again) that the 'high' var will instead be assigned the same value of 'max'.

This is what I have so far in interpreted java:

urls for java servlets are all identical

I am trying to scrape a site where all of the salient html is posted via java servlets. The problem I am encountering is that each page has the same url, e.g. "http:/www.website.com/DisplayServlet".

I am able to isolate the appropriate HTTP transaction from the "Progress" tab in the Proxy session, generate a scrapeable file, and successfully extract the desired data from the scrapeable file.

Regex Character Escapes

I have been using more and more Regex in scripts to alter scraped data. I have noticed that escaped characters are processed correctly in tokens in extractor patterns, however they do not seem to process when used in Interpreted Java scripts. Here is an example of what I mean:

If you double click a token in a pattern and go to the Regular Expression tab and enter this "\d*" it will look for a set of digits in the pattern.

However if you use the same RegEx in a script like this:
value = value.replaceAll("\d*", "0");

how to clear POST values

I'm writing my scraped data to a mysql database using a php script (as described in Tutorial 5 (http://www.screen-scraper.com/support/tutorials/tutorial5/setting_up_the...), and am encountering a strange problem.

Won't write output - but makes file

Hi,

I've just downloaded and started to use this program on Ubuntu Gutsy, and after following tutorials 1-3 I can pull the data result sets and have the Java example (and slightly modifed) script make the file but the files always come up empty and my log file shows a error after creating the file:

here is the output from the Hello World example on tutorial 1:

class name changes to designate attribute

I'm new to screen-scraper and working on my first scrape.

The site I am scraping indicates that an action is confirmed as completed by changing the class name itself to indicate the item should be displayed bold. If I use a standard extractor pattern I can extract either bold data or non-bold data, but not both. It would seem I can do this with a sub-extractor pattern, but I'm wondering if there is a way to use a wildcard in the literal class name.

The html looks like:

OR

Scraping Sessions lost every restart

Hello,

I have upgraded from latest stable 3.x professional to 4.0 professional. These are the steps I followed:

differences prof edition 3.x and enterprise edition

Hi,

Is it possible to give an overview of differences in behaviour between prof. edition 3.x and the new enterprise edition ? E.g. concerning the standard logfiles. Tidy.log and wrapper.log don't exist anymore. What information comes into the stdout.log file ? What means the error in stderr.log "// Error: Attempt to access method on primitive... allowing bsh.Primitive to peek through for debugging" ?

funny characters

When I scrape a site, I sometimes get a funny character, like: Ã'’ replacing: '

which is the character that appears on the website. The only way i can remove it is by doing a find and replace in my text editor. This is not such a big deal, but i rather not get these, since PHP doesn't seem to want to find and replace these characters.

Mac command line usage...?

Hello!

I've set up my scraping session such that it works great from inside the workbench.

So, from a Unix shell, I cd into that directory, and type:
java -jar screen-scraper.jar -s "SessionName"

But it tells me:
The script "SessionName" was not found. Please check the name and try again.

I tried many variations on that command line (including specifying a script instead of a session; leaving the "-s" flag off), but nothing did what it was supposed to. What am I doing wrong?