screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Mapping product numbers

Hi

I have just been looking at screen-scraper for a few days and am planning to use it for scraping e-commerce sites. I have a bit experience with Java, and after trying the tutorials I think the programming of the scraper won't be a problem for me.

As one of the posts in the blog says, this is just one part of it. Another is to find the right pages, and get the data mapped together. I think these two problems will take the most of my time. I thought of one way to do it.

ssv45 on linux

testing JVM in /usr ...
Warning: Cannot convert string "-b&h-luxi sans-medium-r-normal--*-140-*-*-p-*-iso8859-1" to type FontStruct
Warning: Cannot convert string "-arphic-ar pl shanheisun uni-medium-r-normal--*-*-*-*-p-*-iso10646-1" to type FontStruct
Warning: Cannot convert string "-arphic-ar pl uming uni-medium-r-normal--*-*-*-*-p-*-iso10646-1" to type FontStruct
Warning: Cannot convert string "-sazanami-gothic-medium-r-normal--*-140-*-*-c-*-jisx0208.1983-0" to type FontStruct

remote scraping sessions called from beanshell

I seem to be able to call a remote session ok and pass session variables back and forth as long as it's not in lazy mode.

couple of questions

My session stores search results in a .htm file each. It also creates a Overview.htm with each search result listed and linked.

I need the date (dd-mm-yyyy-hh:mm) in the title of the Overview.htm, so I create Date() in the Initialize Session script.
Noelle showed us this:

session's import question in v4.5

hi guys,

when import a seesion by copy it to another ss'import directory, and run it by command line
but sometimes i found ss can't analyse out session included script completely

I haven't meet a similar problem in V4.0 and it's import analyse very well .

do i need fix some properties in V4.5 or it only import by GUI ways ?

Thanks in advance

will

multi-threading questions

I'm writing a project to scrape some very large forums.

I have one scraping session which collects all the config data and pretty much sets everything up for the main scraping exercise which is the posts.

I want to scrape the posts in several threads at once. The number of threads will be set by a session var in the init script (along with lots of other parameters) and basically I'm just planning to use an iterative loop to check if each thread is finished then spawn another one if it is...

godaddy - am I dreaming?

I realise my chances of being able to make this happen aren't real good but I'm trying it anyway.

I'm trying to install on a godaddy linus deluxe shared hosting account... I can run the install successfully in an ssh session.

If I try to run it using the link I get the following:

'font' gets replaced with 'span'

I'm getting some strange results when I examine my last scraped data.

I am scraping two pages from a site, which are identical apart from being for two different products. One gives:

 <h2><b>Price:</b> <span style="color: 990000">£1349.00</span></h2>

The other gives:

 <h2><b>Price: </b><font color="990000">&pound;239.00</font></h2>

The response exceeded the maximum length and was truncated. If you'd like to view the full response........

hi guys

"The response exceeded the maximum length and was truncated. If you'd like to view the full response, click the "Display Response in Browser" button, then view the source in your web browser."

what's going on with the version ssV45? i got that message in "Last Response". Is this a bug or ...?

//Max