Internal Server Error when using Screen Scraper
Hi,
I am trying to scrape some information from the following URL: http://quote.tse.or.jp/tse/quote.cgi?F=listing/Ecs00 by searching for some companies, say, "corporation". I find that when i attempt it from the browser, the results are displaying correctly whereas when i try the same using a Screen Scraper session, the website returns an 'internal server error' message. I also checked and found that no POST parameters were being passed. Then, what could be the reason I am getting the 500 error code? Am I overlooking something here? Could it be because a .exe (as on this website) behaves differently from other file types like a .php or a .asp?
many thanks,
hemanth
Please note: I use Screen Scraper Professional Edition v2.6 and IE 6 on WinXP Professional.
Internal Server Error when using Screen Scraper
Hi Alan,
Thanks again. This works! It's very strange though, but it works nicely.
Regards,
hemanth
Internal Server Error when using Screen Scraper
I have to agree, this is a really strange bug. I've noticed that my initial scraping session returns the 500 error now and that the URL of the page in the browser is always the same, so I don't really know where it's getting any other parameters from (like you said, there's no POST data).
I played around with it a little, though, and I think I've found out how to get around this. I added another scrapeable file to the session and set it as first in the sequence. The URL of the scrapeable file is the start page, http://quote.tse.or.jp/tse/quote.cgi?F=listing/Ecs00 . I didn't put any post data or patterns, so all it does it hit that page and then go to the search results page. So far, it works every time. I guess there must be some sort of invisible parameters that the search results page expects to get from the starting page.
Hope that helps.
-Alan
Internal Server Error when using Screen Scraper
Hi Alan,
Thanks for your response. You have understood my question perfectly. I tried everything just the way you mentioned. I even double checked now. I still seem to face the same problem. Here is the exact sequence of things i did.
I created a proxy session and recorded from http://quote.tse.or.jp/tse/quote.cgi? onwards. Request and response for these showed correctly in the Proxy session's 'Progress' tab, and response in Screen Scraper was exactly as displayed in the browser and as expected. So I created a scraping session and generated a scrapeable file. Here too, obviously, the response showed correctly at this point of time. So, I created an extractor pattern and verified it by applying pattern to scraped data. Then, I invoked this scraping session. I tried this in two ways (by clicking on 'Start Scraper' in the Session's General tab and also through a script). In both cases, the response I received was the same, an exract of which i quote below:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, [no address given] and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
This is the URL, BTW: http://quote.tse.or.jp/tse/qsearch.exe?F=listing%2Fecslist&KEY1=corporation&KEY5=&KEY3=&kind=TTCODE&sort=%2B&MAXDISP=25&KEY2=&REFINDEX=%2BTTCODE
I fail to understand where I am going wrong. Do you have any ideas about this?
Internal Server Error when using Screen Scraper
Hello,
So far as I can tell, I was able to get results exactly as they are on the page when I tried scraping the site you mentioned. What I did was go to http://quote.tse.or.jp/tse/quote.cgi?F=listing/Ecs00 and then turn on the proxy in screen-scraper. After that, I set my browser to use the proxy and then I searched for "corporation" in the Company Name field. The results page came up with five results. Then, in screen-scraper, I generated a scrapeable file from that proxied page in a new scraping session. When I ran the new scraping session, the Last Response tab of that scrapeable file looked just like it did in my browser.
It's possible I don't understand your question; did I do anything differently than you?
Regards,
Alan