resolved url and 404 error
I'm new to screen scraper and programming, but trying to make this work. I'm trying to apply tutorial 2 to a different website BUT.... i'm having trouble because the url between the first and second results pages are the same, I can't seem to find a variable that starts my extraction pattern.
url is this http://www.swoopo.com/brw/vouchers_58.html?pge=10&ast=3
and this is error code
Scraping file: "File from swoopo"
File from swoopo: Preliminary URL: http://www.swoopo.com/ajax/auction-results.html
File from swoopo: Using strict mode.
File from swoopo: Resolved URL: http://www.swoopo.com/ajax/auction-results.html?ast=3&cid=58&pgn=2&pge=10
File from swoopo: Sending request.
File from swoopo: Warning! Received a status code of: 404.
File from swoopo: Processing scripts before all pattern applications.
the resolved url doesn't exist???? what am I doing wrong
thanks in advance for any info... please let me no if you need any more information
James
Ok downloaded professional
Ok downloaded professional version to handle the cookies.... but now... when I set the proxy on ie I get this page cannot be displayed. Why does basic work, but pro doesn't.
thanks for the continual help
James
Hmmm... despite what it looks
Hmmm... despite what it looks like, there's nothing about Pro or Enterprise that is different at the core of the program.
As for those cookies... I'm not sure that you have to do that yourself, in this case. All the screen-scraper versions will handle standard cookies by themselves. You may end up with problems, though, if a page is using javascript (which Swoopo uses very liberally) to set cookies.
Any javascript crucial to page navigation will need to be emulated by you, since screen-scraper doesn't actually execute any dynamic page content. I'd make sure you follow everything that the "change_browse_page" javascript function does, and make sure you're doing it manually, or else you may very well end up with errors like that one..
Hope that's worth something for you. If not, continue to ask away, and especially once our in-office workload starts to calm down, I can take a more thorough look at it.
Tim
Ok, still working on it I am
Ok, still working on it
I am completely lost on the manually working out the dynamic page content.
This is the area: a href="javascript:change_browse_page('auction_browse_58_3', '/ajax/auction-results.html?ast=3&cid=58&pgn=5&pge=10');">5
And I know pgn=5 is the value that changes the page, I save that variable in session, but it won't input it into url...
Furthermore I can't get the page to load when I input pgn=5 into the ie browser. Does there exist a url that takes me to the dynamic content ... is that what I should be looking for???
Thanks for steering me in the right direction,
James
Well, you know that
Well, you know that
/ajax/auction-results.html?ast=3&cid=58&pgn=5&pge=10
is the url through which it requests the dynamic content, but I can't seem to successfully request the ajax content... I get a 404 as well. I was looking at the javascript a little bit, and that's all there should be to it... but... doesn't seem to work...You've picked a real doozie of a site :) You're on the right track, exactly, but I'm not sure why it's not working yet.
Ok i've been looking over the
Ok i've been looking over the page.... in the parameters are the cookie 1; __utma=82882452.2125261529.1238407967.1238557751.1238564577.3; __utmz=82882452.1238407967.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmb=82882452; __utmc=82882452
And my "Next Page Link"
a href="javascript:change_browse_page('auction_browse_58_3', '/ajax/auction-results.html?ast=3&cid=58&pgn=2&pge=10');">2
Setting the pgn variable pgn=~@PAGE@~ (GET) ... and the 2 into >~@NUM@~
Am I on the right track? And how do I handle the cookie parameter? Is that a POST?
Thanks again for the help
James
On the right track
James,
This all looks really good! I'd say that you're working in the right direction.
For the cookie issue please see this page:
http://community.screen-scraper.com/API/setCookie
You'll have to make the call on which variables in the javascript need to be scraped and stored, but I think that you should soon be able to move from page to page.
Thanks
scraper
ajax
First, check for cookies. See if there is a cookie set by ajax/javascript somewhere because if there is screen-scraper won't pick it up. Screen-scraper handles conventional cookies but not the kind set by ajax/javascript. This site leans heavily on both so I'd bet there is a cookie too.
Next, when proxying this site pay very close attention to what requests are sent out, what variables they contain, and what responses are brought back. I think that if you can find a recurring pattern in the reponses you'll be able to solve this.
Best of luck.
scraper
resolved url and 404 error
Are you possibly missing a redirect and you may need to extract a new query string?