screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Writing value from parent file after scraping child file

Excellent product. I am just having trouble with a particular pair of linked extraction patterns. There is a value in the first file that is scraped (the parent) that I wish written to the file created when scraping the second file (the child), because otherwise I can find no way to associate the resulting records in my database later as parent-and-child. (Normally I use the parent URL for this, but that does not work in this particular case.) I am saving the value from the parent file in a Session Variable, but it is always written as NULL, as though it has been "forgotten." Any advice?

AjaxControlToolkit -- NoBot : Have you ever encountered this?

I am trying to scrape a site that uses the NoBot anti-scraping method. Here is the documentation for NoBot:

http://www.asp.net/ajaxLibrary/AjaxControlToolkitSampleSite/NoBot/NoBot.aspx

It's a .Net thing that does some little fun things to defeat anyone who's not using a proper browser. At the very least you need to execute a javascript snippet and provide the response they're looking for. There are other elements to NoBot but the snippet is the only one (currently) I'm wrestling with.

Thoughts?

Captcha

Hi Guys,
As usual, you're doing great work, thanks for that but,
is there any new idea how to solve captcha problem?
last time when i came accross captacha was in 2010 and since than i just have to suspend all projects required me to use it

Cheers

Radek

Is there more info/documentation on the Advanced setting for "Max retries per file"?

A common situation I'm seeing in our scraper scripts is that we hit Max Retries per file limit while trying to go through a set of proxy entries.

When the limit is reached the page is dropped and the next search term is picked up and the cycle starts over. This causes us to skip an entire search term.

I'm looking for more documentation on this setting so I can figure out how to detect the Max Retries event. So that we can restart that search term.

5 series vs 6 series exports

Do you see any reason why exports from screen-scraper 6.0 would or would not work properly with 5.5xa installs? I haven't even tried it yet, but wanted a semi-official (unofficial, even) verification that the export formats and data should be compatible.

Ajax not captured by proxy server

I'm trying to read the following page:

http://www.flipkey.com/tofte-vacation-rentals/p265481/

During a request the proxy server isn't capturing the calendar (see Availability) request. It looks like the page is using java and ajax to retrieve this and dynamically update the page.

Should the proxy server capture the Ajax request?

Content is not allowed in prolog

I have exported a sss from my development machine and am now trying to import it in to my lab machine but I am getting the error "Content is now allowed in prolog". The export contains a scraping session, scrapeable files and some scripts but all work perfectly. Has anyone comes across this before and is there a fix.

Connection timeout error when validating proxies - "The host did not accept the connection within timeout of 9000 ms"

We are running some proxy gathering scrapers in server mode. We usually see a list of 2k or more to sort through.
During the validation phase I see the first 200 or so come back just fine and then I start getting the rest failing with a connection timeout error.
"The host did not accept the connection within timeout of 9000 ms"

In the script we have "proxyServerPool.filter( 9 );"
So that explains the 9000 ms part.
But why are the first 200 fine and then the rest fail?

using Classes developed in Eclipse

SOLVED! It was a question of telling Eclipse to specifically export the package mylib.

best regards

Christian Pieler

Interpreted Java and Regular Expressions

I have finally made the leap from vbScript to interpreted java but am having a problem with one of my scripts. Can anyone help?

I am applying a function to extracted data before saving it to a database with the intention of removing unwanted (but not all) html tags and attributes. The regex works fine in vbScript but not in java and I am assuming that I am missing something basic (I used the help file here to work out what I needed to escape but have probably made a hash of it!).

Here's my code: