Login issues
I'm trying to scrape data from behind a login page and I have some valid credentials but I cannot get past the login using Screen Scraper but I can by entering the data directly in either firefox or Safari.
I've tried adding the login details in the advanced tab, I've also tried adding them as parameters and setting the type to POST. Neither works.
Any suggestions?
Login issues
Alex,
It's hard to know with the captcha involved but it sounds like the session cookie is not being passed. If you compare the last request from your proxy session to the last request in your scraping session do they both show a session cookie(s) being past?
-Scott
Logging in
Hi,
The issue I had was that I had an initial page (a form with a captcha) which then provided a results page. From the results page I collected some product IDs then the submitted these to get at the details page, however the first visit to a details page required a web based login.
This was capture well enough using a proxy session but when I came to run the scraping session I had to first deal with the captcha (see another post) and then I hit the problem that none of the details pages would be displayed, I just got the login form at each request.
However!
Reading through one of your excellent (and I mean it) tutorials, I discovered than you could capture the login page and then run the post request before running the details request. I've tried this and it works (after a fashion).
The issue I have now is how to trigger this in a sensible fashion.
At the moment I call my main (list/results) scraping session, then using a script, call the scrape details session for each match of the pattern (product ID) on that page. To get around the login I call another script which calls the login scrape session "before file is scraped" from the details session. As I said, this works but it does mean that I'm calling the login for each details page scrape (as I can see from the logs) which seems inefficient.
So, I think the product and the tutorials are great, I can scrape my site so no more help is really needed but it would be good to learn a more efficient way of doing this.
rgds/alex
Login issues
Alex,
In order to offer ideas for solutions beyond something generic we would need to see the relevant parts of the code you're working with. Also, it's helpful if you can give examples of things you've tried that either failed wholly or in part to give us ideas for how you may need to correct your approach.
Please just try to be more specific so whoever chimes in has more to work with. One of our folks will be following up on one of your earlier posts soon.
Thanks,
Scott