scrap search page but must visit home page
i have a site to scrap
url:http://www.parliament.uk/directories/hciolists/alms.cfm
I have to click on link(Boig) and scrap all the details from page opened after ckicking.
In scrapper tool I created a session and provide url as http://www.parliament.uk/directories/hciolists/alms.cfm
I extracted the id and stored in session.
I created another session and provided the url
http://biographies.parliament.uk/parliament/default.asp?id=~#memberid#~
but when i run the scrapping session i get blank page for second url.
Problem here is i cannot request the second url directly.
Please suggest me how can i scrap the second page, if its possible please provide me script through xml.
I ham getting ths problem regularly but cant find solution in the tutorial provided
scrap search page but must visit home page
pervez,
You'll need to utilize a method that we don't often use. Take a look at the link below.
http://www.screen-scraper.com/support/docs/api_documentation.php#setReferer
This explains how to set the referrer for a given page. In your case you'll be using this method for the biography page you trying to end up on.
It's not very common that a page requires a refer that is not the actual referring page (the dodonline.co.uk page would be the actual referring page if their server wasn't resetting it...as it is). Perhaps it was meant as a kind of qaazi security layer or a trap for would-be spammers. I hope you're not the latter.
:twisted:
Follow the directions closely and let us know how it turns out.
Thanks,
Scott
scrap search page but must visit home page
Hello Scott,
Thanks for your reply. I also agree that without including the dodonline.co.uk page the biographies details page does not have the right info to know what to render but can you please suggest me the process or script about scrapping the details after clicking on Boig link. Is this kind of pages can be scrapped from your tool?Other than these type of web pages i dont have any problem scrapping and your tool works perfectly.
I also need your suggestion for espc website(http://www.espc.com)
scrap search page but must visit home page
pervez,
Before you make any scrapeable files from scratch you should try using the screen-scraper proxy and use the pages that it records instead of making them from scratch. I am still working on the issue you're having with the espc.com site (but, I assure you the problems we're having with proxying that site are rare).
When you proxy the UK Parliament site you'll find that when you click on the member's "Biog" link you are taken to a site off of their domain (dodonline.co.uk) then redirected back on to their domain but at the subdomain biographies.
Without including the dodonline.co.uk page the biographies details page does not have the right info to know what to render. As you found out, the following link was not adequate.
http://biographies.parliament.uk/parliament/default.asp?id=IDHERE
If you're having this issue on other sites it is likely that you're not seeing everything you could if you used the proxy rather than trying build the pages just based on what you see in your browser.
I hope this helps.
-Scott