Scrape: Login - script/extract - Logoff

Forgive it this is a totally newbie question, but, well, I'm a total newbie with screen-scraper. I have gone through most of the tutorials, which helped, but am still in the dark about something.

I need to use SS to log onto a site, request a file for download (session.downloadFile), and then log off the site (by navigating to another URL). If I don't complete the last step the site will be locked for an unspecified amount of time, perhaps preventing a follow-up scrape.

What I've seen from the docs and tutorials is that the download script can only be invoked on the last step (http transaction) of the scrape. In my case I need the download script to run halfway through the session, between the login and logout transactions. I have tried clicking on one of the in-between transactions, but it fired after the session logged out.

I probably don't need (don't expect) any step-by-step advice, just a point in the right direction. Thanks for any help!

-Steve

sschafer on 01/14/2008 at 4:02 pm

screen-scraper public support

Scrape: Login - script/extract - Logoff

You are right. Thanks for your continued support, Scott.

-Steve

sschafer on 01/15/2008 at 5:44 pm

Scrape: Login - script/extract - Logoff

Steve,

I think the issue you're having is the fact that your session never does successfully log in. So, I recommend that you slow the ship down and focus on correcting this first step.

Logging in can be tricky with some sites. In most cases screen-scraper will handle things automatically but sometimes you'll need to manually set the cookie or, perhaps, a referrer. It's by investigation of the request and response of the HTTP transaction that you can identify what you may need to do.

Take the request and response from your proxy session and compare them to the same in your scraping session. Keep it simple and do this only for the log in authentication process and no other steps until the last response in your scraping session shows you've successfully logged in.

If you need to manually set a cookie...
http://www.screen-scraper.com/support/docs/api_documentation.php#setCookie

If you need to manually set a referrer...
http://www.screen-scraper.com/support/docs/api_documentation.php#setReferer

HTTP Primer, if needed...
http://www.screen-scraper.com/support/docs/how_http_works.php

Also, you'll use "run after pattern/extract match" when executing a script from an extractor pattern.

I hope this helps and feel free to post any specific issue you may have.

-Scott

swilsonmc on 01/15/2008 at 4:53 pm

Scrape: Login - script/extract - Logoff

I have tried saveFileOnRequest, that is, tried implementing it. I think one problem with this method is that the site I'm trying to get the file from requires cookies for each request, cookies set during the login process.

I think the bigger problem is that I am having problem with the paradigms of screen-scraper. On the surface it looks very capable and perfect for our application (grabbing several spreadsheets from several sites, several times an hour), but I can't quite grasp how to do this one.

For this one application in particular:

* We navigate to the login page and pass POST data to the login handler/form (a Java app behind the scenes) - all captured by the proxy.

* We then navigate to the "download files here" link, or pass the absolute URL to get the desired file directly - proxied navigation and I'd think a downloadFile or saveFileOnRequest method.

* Lastly we navigate to the "logoff" page (which includes a "are you sure" JavaScript prompt) and logoff, ensuring that the app doesn't lock our login for an unspecified amount of time

The proxy handles all of this beautifully, and shows the appropriate pages captured during the session when I examine it's results. The problem is that I don't understand or can't get the logic inserted in the scraping session to grab the required file. I have tried several methods, but always end up with a file containing the HTML of the logon page--I have no idea how that is happening.

From the examples and tutorials, it seems that the best method is to apply scripts and such to the last page scraped (the end of the session). So I've tried shortening the session to the download page (omitting the logout navigation) and even the logged in page (omitting all other navigation after achieving login). I get the same results. (maybe the SS is failing the login on scraping runs?)

One particular problem I've had is finding the "run after pattern/extract match" option for scripts. The only two options I seem to have is before/after scraping. I'm obviously not looking in the right place.

With the abundant documentation--documentation on using the Dashboard, API docs, examples and tutorials, and the forum--I certainly don't expect the script(s) to be written for me. I present this amount of detail only in the hopes that someone can point me in the right direction and get me over the comprehension hump.

In any case, thanks for listening!

-Steve

sschafer on 01/15/2008 at 12:27 pm

Scrape: Login - script/extract - Logoff

Steve,

You may try the method saveFileOnRequest() as an alternative. With this method you're able to pass Post parameters. It requires the professional edition of screen-scraper which is available as a free trial for 30 days.

http://screen-scraper.com/support/docs/api_documentation.php#saveFileOnRequest

Please let us know if this helps.

Thanks,
Scott

swilsonmc on 01/15/2008 at 10:53 am

Search

Community

screen-scraper

User login

Scrape: Login - script/extract - Logoff

Scrape: Login - script/extract - Logoff

Scrape: Login - script/extract - Logoff

Scrape: Login - script/extract - Logoff

Scrape: Login - script/extract - Logoff