Scraping session not capturing data
I created a proxy session and stepped through the pages of a website. The data was captured in the proxy session. The last response of the last transaction contains the data I want to download. It is a csv file. All I want to do is to have screen-scraper do the same thing as what I did in the proxy session, and then I want it to run a script that downloads the file to a disk. I have successfully done this before with another website.
So I looked at all of the transactions that the proxy session created. I created scraping files for each transaction that has data in the "Post Data" prortion of the Request tab. I did this because I believe that is all that is required. I didn't do anything else. In addition I created a scraping file for the last transaction, which contains the data I want to download. I added a script to it whose purpose is to download the file. The script is irrelevent at this point because the data it downloads is different from what it was when I created the proxy session.
When I have my .NET program run the scraping session or just click "Run Scraping Session" on the scraping session, it does not run the same way it did when I created the proxy session. It doesn't produce any data. I know this is a general question, but what am I missing? Do I have a misconception about only needing to create scraping files for transactions that have data in "POST Data" in the Request tab of the transaction? Should I create scraping fils for pages that don't have parameters too? If so, why? Might these pages be doing work like updating cookies and saving data to session state that the following pages need? How can I know which transactions to create scraping files for?
There are http web pages followed by a set of https web pages. The first https web page produced this message in the session log:
"Warning! Received a status code of: 401."
I see that means unauthorized access. After I saw that I created a scraping file for every transaction, but it still produces that error code. How could the scraping session get that if there is a scraping file for every transaction that leads up to it? There are 2 transactions with a status of "Error" and nothing else right before the first https transaction.
It shows that it is using the user ID and password. The website is set up so that the same URL is used for multiple transactions. For example, the URL for each page I scrape is https://sell.freddiemac.com/dispatch. What should I look for? What are the next debugging steps?
Gary, It sounds like you're
Gary,
It sounds like you're close. You decide which proxy transactions to use based more on the content in the response than whether or not there is any post data. For example, the page that contains your username and password under the parameters tab may also have another parameter that is some kind of id or funny looking string of characters for a session id.
A good rule of thumb is to extract that id or those funny-looking characters from the previous page, save them as a session variables and refer to the session variables under the parameters tab using something like ~#my_var#~.
You may want to take 30 minutes or so and go through the first few of our tutorials to get acquainted with how this works.
http://community.screen-scraper.com/Tutorials_Menu
-Scott
My guess is that there is a
My guess is that there is a session, VIEWSTATE, request header, or cookie that you need make sure is valid. If you look at your HTTP request, and see if:
It's working now.
Thanks for filling in some gaps in my knowledge. I wasn't sure how to determine which transactions to build scraping files for. For this scraping session I created a scraping file for every transaction. One of the last scraping files contained the data I needed. Yes, I will go through the tutorials again and this time the experience I had creating scraping session to-date should make the rereading more worthwhile. At the point I was at yesterday I had some pressure to get this scraping session working (and similar ones soon) and to get an answer to my question. So -- many thanks for your timely responses!