Tried everything, couldn't scrap SSL site.

Hello All,
I wanted to know if someone could succesfully scrap an SSL site.

I've tried with the following browsers (using Ubuntu/ Professional trial of screen-scraper for Linux)
Firefox 2.0
IE 6
Opera 9.5b

And I still couldn't succesfully get content. It works perfectly for non Secure sites, but with SSL was imposible to make it work.

The site I was trying to work with is:

www.merca**.es (An spanish supermarket site)

After connecting with proxy through port 8777 It comes the alert that the certificate is signed for www.merca**.es but belongs to www.screen-scraper.com ( I Accept)

And then I see in the log that negotiates succesfully SSl, but the web browser shows just a title and the page remains blanks, just the title comes thorugh. Is anything I'm missing ?

Thanks very much in advance.

movil on 02/27/2009 at 3:21 am

screen-scraper public support

i am not sure what the

i am not sure what the problem is. I also can't seem to get that site proxied with Ubuntu/Opera 9.51. I get an error in the status field on screen-scrapers proxy. I also had a problem with ie8 under Windows. Opera 9.5 under Mac OSX worked fine though. If you generate the scrapeable file from the proxy session then run your scraping session it seems like screen-scraper is able to resolve the page and download its content though, then you can use that make/test your extractor patterns.

ryanj on 03/02/2009 at 6:52 pm

To clarify, if you generate

To clarify, if you generate the scrapeableFile (ignoring the Error you're getting) from the Proxy's Progress tab, you can run your scraping session and it'll wind up populating that "Last Response" tab of the scrapeableFile anyway.

From there, you can make your extractor patterns as usual.

timv on 03/06/2009 at 7:30 pm

Thanks both

I could work it out without the proxy.

The only thing that makes things difficult is entering the correct values to validate session info and forms without the proxy help. But nothing that cannot be worked out with a little bit of Patience.

Oh patience, great treasure...

movil on 03/09/2009 at 4:58 pm

Even if the proxying fails to

Even if the proxying fails to go through, if an entry appears in the proxy progress tab (error or no), it should have the GET/POST variables in it. SS can know about the *request* easily enough. It just fails to get the response. So you could still generate a scrapeableFile from the failed proxy attempt.

It'd be better than doing it by hand... some web sites are horrible... oh so horrible...

timv on 03/10/2009 at 3:04 pm

Search

Community

screen-scraper

User login

Tried everything, couldn't scrap SSL site.

i am not sure what the

To clarify, if you generate

Thanks both

Even if the proxying fails to