Using data in proxy server log
Hi, im trying to retrive a url that only ever appears in the PRoxy server log file. I believe this to be GET data? (although I may be barking up the wrong tree) Basically when I send a session to
i want to retrieve this URL
http://www.cnet.co.uk/i/ads/ho/House%20Ads/Metaboli/758x140.swf?clickTAG...
although when transfering the proxy server capture http process url to the scrape session the source does not contain the above URL. Is this HTTP headers?
Sorry Im not using the correct terminology.... :-(
any help in allowing my brain to understand this would be of great help - thanks
Using data in proxy server log
kerri,
Sorry for the long delay in getting back with you.
It may be. Without looking to deeply into your specific situation let me say that generally when dealing with flash movies you have basically two options. One, the URL you're after is appended to the URL (often along with a few other tertiary URL's that daisy chain the clicker from gamespot all the way to Metaboli). Second, the site contains an alternate href and non-flash image for the flash plug-in challenged browsers.
There are two possible options. I took a peek at gamespot and the one thing different about how they're implementing flash is that they're writing out the Flash-related code using JavaScript.
I hope this helps,
Scott
Using data in proxy server log
hi scott, thanks for your feedback.
Im not sure I explained my problem correctly. I have tried the binary files filter and read the tips, without any further understanding. I want to capture the ad's url, however when I visit www.gamespot.com in a scrape session, the URL (that I pasted in the first post and is in the proxy server progress list) is replaced with
http://ad.doubleclick.net/jump/gamespotuk.home;tile=2;sz=758x140;ord=1192696365;chan=home?
Therefore in order to scrape the advert, I need to know the actual URL as the above replaced URL in the scrape log does not direct me to the advert?
Is this possible?
thanks again, kerri
Using data in proxy server log
kerrid,
The URL you cite is to a Flash move in SWF format. To see it in your browser simply visit the URL.
It is a binary file which the screen-scraper proxy is set to filter out by default (turn filtering off by unchecking "Don't log binary files" under the General tab). That's why you see it in the log but not in the transactions list.
I'm not sure that you would want to attempt to scrape the Flash movie (it's ad to an online retailer of games - metaboli.co.uk). But if you did you may have spotty luck since only under some circumstances have we been able to scrape data from SWF files.
see:
http://blog.screen-scraper.com/2006/07/31/extracting-data-from-java-applets-activex-controls-and-adobe-flash-movies/
Please let us know if you have any further questions.
Thanks,
Scott