Setting Extractor Pattern Token within iframe tag?
Hello,
I've been using ScreenScraper to do some simple test scrapes from websites, and it works very well. I'm trying to step things up a bit and get SS to do even more things for me: namely automate the downloading of a PDF file. I've read the various posts about this, and am following the advice in the post "Can PDF be saved?" i.e.
But my problem is in setting up the extractor pattern from this chunk of html:
I can set up the extractor pattern, but when performing the scrape, the log states: "The pattern did not find any matches." I've tried setting the token up in the first
<iframe
< a href
<iframe
(Original page: <a href="http://v3.espacenet.com/publicationDetails/originalDocument?CC=US&NR=4844520A&KC=A&FT=D&date=19890704&DB=EPODOC&locale=en_gb" title="http://v3.espacenet.com/publicationDetails/originalDocument?CC=US&NR=4844520A&KC=A&FT=D&date=19890704&DB=EPODOC&locale=en_gb">http://v3.espacenet.com/publicationDetails/originalDocument?CC=US&NR=484...</a>)
Thanks for any light you can shine on this!
James
The problem is you've got
The problem is you've got additional attributes between the closing quotes and the closing ">" of the tag
http://v3.espacenet.com/espacenetDocument.pdf?flavour=phantomFull&locale=en_GB&FT=D&date=19890704&CC=US&NR=4844520A&KC=A" target="MaxView"
i.e. target="MaxView"
I assume you've got the token set to use the 'non-double quotes'' regex pattern? If not you would be getting a match but you'd be getting everything up to the
You could try:
as our extractor patter (with non-double quotes as the regex).
If that's the html you're seeing in the scrapeable file response tab then it's probably not an iframe problem. if it is you could try using the iframe tag to grab the URL... i.e.:
iframe src="~@PDF_URL@~"
since it holds the same URL anyway...