screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Scraping multiple items from a part of a page

I would like to scrape multiple items from part of a page

Simplified example:

Drinks:
Beer
Water
Juice

Food:
Hamburger
Fries
Chicken

Beginner help.

Hi,
I am learning to use screen-scraper and encountered problem.

Some of my results in a table are links and some just plain text. For example:

1. John
2. Tom
3. Greg
4. Brian

How do I get rid of html elements?

Inconsistent characters in XML output

My XML parser says I have a bad character but it doesn't know me that well.

Let me see if I can explain:

I am having the same issue on all versions of screen scraper 3.0.67a, 3.0.70a and currently 4.0 running on Windows XP Pro. I have several scrapes that have worked fine untill a few days ago, they are still working in production but not on my development box.

screen-scraper server model goes wrong on linux pc

We cannot find the original source of an exception.
When screen-scraper is invoked from the commandline for adding a task,
an exception is caught by an interpreted java script that we are calling.
The exception is not very descriptive:

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

It would at first glance appear that CLASSPATH is not set, but it is:

# CLASSPATH=$HOME/ss/mysql-connector-java-5.1.5-bin.jar:$CLASSPATH java -jar screen-scraper.jar -s "generic - call-script" -p "scrapeFile=foo.quote"

incremental URL scraping

I know this relates kind of to the first two tutorials, but i've been have a semi difficult time trying to set it up -- so i thought i would ask here and see if anyone had any input.

Basically I would like to scrape some info from a site based on incremental URLs. ie: the website http://www.ccc.com/item.php?kasi=00001. I want the scraper to get a few pieces of information from the site (Book name, author, price, etc) from each site from kasi=00001 to 00200.

Force multi-part?

I have a form of POST parameters and a funky URL like:

http://foo.com/bar?12345

Which, when requested just like that, end up being a POST. But the form needs to post as a multi-part. If I add a GET parameter to the list of params it does multi-part, but the site doesn't like that extra parameter. I tried adding a file upload as a parameter and screen-scraper just spins and spins and spins..

like condition in Javascript

Hi,

I would like to extract data that included a certain word in a variable. Although when I do this

unfortunately the productname variable includes all html until it finds the word Roundup (which could be quite a way down the page)

If there a way I could in my script test if productname has the word roundup in it? i.e

String pagestart = session.getVariable( "productname" );
if(pagestart.equals("Roundup") )
DO THIS

then I would be able to capture the whole product name and only go forward with the scrape if it the test = true.

setting dynamic parameters

Hello,

I am setting some dynamic parameters through a script by using the scrapeableFile.addHTTPParameter() function. However when I run another script after this one that uses the scrapeableFile.getCurrentPOSTData() function to get the POST parameters (that's what I assume it does :roll: ) and print them out to the session's log, this function returns null.

Scrape file parameters in basic version 4.0

:?: I just upgraded from v3 to v4 of screen-scraper basic and now my scraping session doesn't work because there are no "Parameters" and I can't re-add them.

Clicking "Add Parameter" changes the width of the columns but doesn't actually add anything.

I need to submit some values by POST to log into the site. Why have parameters stopped working and is there an alternative way to do it in the basic version?

Thanks.

Where are proxy & scrapting session data stored on compu

I had to migrate to a new computer and thought that I backed up everything. After installing screen-scraper and starting it, none of my proxy and scraping sessions are now in the new version even after I COPIED the entire program folder from "program files".

Where do I find that old data so I can bring back my hours and hours of work setting up these sessions?

Robert