tidying HTML failed
Hello,
I requested a url-http://www.yell.com/ucs/HomePageAction.do from scrapper tool and got the following message:
Scraping file: "File from Shopping Site"
File from Shopping Site: Preliminary URL: http://www.yell.com/ucs/HomePageAction.do
File from Shopping Site: Resolved URL: http://www.yell.com/ucs/HomePageAction.do
File from Shopping Site: Sending request.
File from Shopping Site: Sorry, tidying HTML failed. Returning the original HTML.
Processing scripts after scraping session has ended.
Scraping session finished.
tidying HTML failed
pervez,
screen-scraper is limited to what the server is able to deliver. If the server produces a 500 error no matter what means you try to search for data with that postal code there is nothing that can be done with screen-scraper to fix that problem.
Please find a alternate way of searching for the data you need.
Thanks,
Scott
tidying HTML failed
Hello Scott,
Thanks for your reply, but i need to search for the post code BT9.
In home page http://www.yell.com/ucs/HomePageAction.do
i will provide 'real state agent' in product/service box and 'BT9' in location box and enter search. From next page i have to scrap all the details.
Please suggest how do we need to do. In tutorial i didnt find such type of examples. Please confirm me if such type of web pages can be really scrapped?
tidying HTML failed
pervez,
The 500 Internal Server error that is produced when I try it seems to be pointing to the use of BT9 as a location. When I change the location to "UNITED KINGDOM" it works just fine.
I'm sorry, I'm not familiar enough with UK postal cods to know whether or not BT9 should be considered legitimate.
Does this help?
Thanks,
Scott
tidying HTML failed
Hello Scott,
As you suggested to use extractor patterns from untidied html, but i can see garbage value after session is run, so how do i need to write the extractor pattern.
please visit this url: http://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=39578203&keywords=real+state+agent&companyName=&location=BT9&search=SEARCH&M=0.
whn i run a scrapper file i get un readable value. please suggest....
tidying HTML failed
pervez,
This is meant just as an advisement to you the developer and not as an actual error. By default screen-scraper will attempt to tidy the HTML. The main reason for this is to attempt to normalize the code from one page to the next; as well as, force it into compliance for consistent rendering.
http://www.w3.org/People/Raggett/tidy/
If screen-scraper is unable to tidy the html on any given page it simply means you'll need to make your extractor patterns from untidied html. Shouldn't be a problem. Tidying just ensure better consistency but isn't necessary.
-Scott