How to capture Excel files that a website sends to browser from website?

I am attempting to download an Excel spreadsheet from a website. The file appears to be generated by the web page, and it is sent directly to the browser. There is no URL associated with the Excel file -- there is no page that ends with ".xls". I ran the proxy session until a message popped up asking if I wanted to open, save or cancel the display of the Excel file. I clicked open, and the spreadsheet appeared. Then I stopped the proxy session. I created a scrapeable file for the transaction associated with the request to open the spreadsheet. This is URL for the transaction:

http://192.168.1.1:2555/upnp/b752c108-cb81-983f-93fc5e563d560e72/desc.xml
(It is an XML file.)

I created a script that runs after the last transaction has completed:
Neither version returns a .xls file. Instead they return the xml file.

Version 1:
currentURL = scrapeableFile.getCurrentURL();
session.downloadFile(currentURL, "C:\\screen-scraper\\FMPeoples.xls");

Version 2:
scrapeableFile.saveFileOnRequest("C:/screen-scraper/FMPeoples.xls");

In other words, I created a proxy session and added transactions to it by working my way page by page through the website. Everything got recorded OK right up until the end when I wanted it to download an Excel speadsheet that the browser displayed. The scraping session did not record a transaction that contained a URL with the spreadsheet. So I think a web page sent the .xls Excel file directly to the browser. There is no URL that contains the spreadsheet, instead I think there must be a web page that has program code that generates the spreadsheet and sends it to the browser. It does this after issuing a popup box with the options "save" and "cancel". How can you get the scraping session to download a .xls file created this way? And how can you get by the the popup box?

Gary Frank on 02/10/2010 at 12:59 pm

screen-scraper support for licensed users

Gary, First I would try

Gary,

First I would try viewing the xml document in a text editor. My guess is, you'll see the dynamically generated URL to the xls file there. If this is the case, then you can make the xml file a scrapeable file and extract out the url to the xls file.

Otherwise, you can uncheck the box under the proxy session's General tab that says, don't log binary data and rerun your proxy session. Unchecking that box cause the proxy server to log all transactions. If the Excel file is very large it may cause screen-scraper to lock up while it is downloading the file.

While downloading you should see the URL to the xls file as one of the proxy transactions. Chances are the actual url to the Excel file will not end in xls, however. You can know for sure which transaction is the Excel file by looking at the Content Type in the response header.

-Scott

swilsonmc on 02/10/2010 at 2:13 pm

Search

Community

screen-scraper

User login

How to capture Excel files that a website sends to browser from website?

Gary, First I would try