Proxy

Hi,

I have been trying to anonymize my scraping sessions, so installed The Onion Ring(TOR) and Privoxy, I place 8118 localhost in the external proxy settings box, to point s-s toward privoxy, and everything seems to work fine with my existing scraping sessions when run. The problem lies when I try to create new sessions using screen scrapers proxy, I have been using Opera/Firefox to create my scrapable files in the past before installing TOR with no trouble, but now the browsers both seem to just hang (they dont give me any messages, just blank white pages, as if they were about to load, but they never do).

I searched the forums here, and read in one reply about a proxy error, check your ipconfig/all for the words " "Teredo Tunneling Pseudo-Interface"?", which I did, and found that I DO have these words.....however there was no follow up reply to that message, so I don't know what this means, or what I should do now.

I have tried turning off TOR, and trying it again, with still no joy.

I am using Windows Vista, and have tried IE/Firefox/Opera, all with proxy setting localhost;8777.

Cheers for any help

Thanks so much for the

Thanks so much for the detailed replies, I have no idea what the problem was, but in the end I just reinstalled everything (Screen-scraper, TOR, privoxy) and everything works fine, weird! Thanks for the detailed advice though, sure it will help anyone else, who stumbles accross this, out...

A technicality-- are you

A technicality-- are you trying to run the proxy through TOR as well? As in, would you like for TOR to handle your anonymous requests, and then have SS proxy those results?

Or, do you simply want to take TOR out of the picture during the proxy phase?

I was trying to point screen scraper at privoxy, then privoxy was set to automatically go through TOR for my scraping sessions.This was working great, but the problem came when trying to create new sessions, i.e. open firefox, point it to the ss proxy, then have that use no external proxy (i.e. just completley normal use no external proxies involved), and the webpages just completley hanged.

I ended up just reinstalling TOR/privoxy and screen scraper and now everything is happily working again, so not sure what went wrong.....

PS do you know if this is the best way of anonymous scraping, since it's really slowing my scraping sessions down to a halt by going through TOR, perhaps there's a better way?

Huh. I was experiencing the

Huh. I was experiencing the same thing a while back, and ultimately my problem was just that I had forgotten that I had set the entire SS program to try to go through Tor (via the SS settings). That created problems because Tor was no longer running, yet the global SS setting said that it was supposed to go through Privoxy to get to Tor. If you wanted it to stop using Tor in order to use the SS proxy against the page, then you would have to remove that global setting for an external proxy.

However, if you remove the global setting, then your scraping sessions won't have any proxy to use. There's an Advanced tab on each scraping session, where you can set the external proxy Host and Port (ignore the username/password boxes since you're just using privoxy/tor) for that specific scrape. See, when you've got a global external proxy set in SS, it'll actually just go and fill in that Advanced tab for you with that 'localhost' & '9050' stuff. So if you had a global external proxy set, you could actually go to this Advanced tab and remove it from the specific session's settings, and that session would actually use no proxy, despite the global setting.

Your problem seems to be that you've got that global external proxy in SS's settings, which tries to chain Privoxy to the SS proxy. If privoxy and/or tor is not running, then you'll get those strange hanging pages. For me, I'd see the request get sent out in the SS proxy log, but it would never resolve.

*sigh* -- as for a faster way to do things.... not really, short of paying for a third-party anonymization service (which we offer to people, via the Amazon EC2 service), you won't have much success in speeding Tor up. SS is decently fast as it is, since it won't request any external javascript, images, css, extra; just the raw HTML page. We use Tor here in our offices when clients (for whom we scrape data) don't want to pay for that external anonymization. It costs us money to use it by the hour.

If you're using a unix-based machine, you could pretty easily set up your own SSH tunnel, so that you actually look like a different computer to the outside world. But, that wouldn't help you if that 'other computer' is on your same network. This isn't nearly as 'anonymous' as tor, but it could get you around blocked pages.

Some websites may never actually know that you're scraping them, since you're not triggering any javascript with SS. They could figure it out if they went looking, I'm sure, so unless you're hitting them really hard with multiple scrapes at the same time, you might be fine to just leave anonymization out of the picture.

Sorry for the extra long response, but details were in order, I thought.
Tim

A technicality-- are you

A technicality-- are you trying to run the proxy through TOR as well? As in, would you like for TOR to handle your anonymous requests, and then have SS proxy those results?

Or, do you simply want to take TOR out of the picture during the proxy phase?