Can PDF be saved?

Is it possible to save a PDF file to disk when it is in the last response section?
If so, how do I set it up and save it?

Thank you
Author R. Jeter

Can PDF be saved?

Seems like this thread is getting quite a few views. For those that want more info on this particular topic, see my recent blog entry: http://blog.screen-scraper.com/2006/08/02/extracting-data-from-pdf-files/.

Todd

Can PDF be saved?

We're going to the local filesystem -- turns out that's more efficient. Thanks, though! Love the product.

Can PDF be saved?

Just another thought or two on this...

Assuming the PDF can be accessed from a straight HTTP GET request (e.g., that you don't have to authenticate through a web site to get to it), you might just request it directly using Sun's HttpUrlConnection class (here) or (even better) HttpClient (here). The HttpClient jar is in the classpath, so feel free to make use of it in your scripts. With either of those libraries you should be able to get a reference to a stream that would let you grab the PDF.

Best,

Todd

Can PDF be saved?

Hi,

If you want the contents of the PDF as a byte array or something like that, I think your best bet would be to download the file to the local file system, then read it in using Java code in a script. Would that do the trick for you?

Kind regards,

Todd

Can PDF be saved?

I think I'm trying to do the same thing -- which is get the binary contents of the PDF into a session variable. (Yup, I know how big it can get, etc.)

Rather than have session.downloadFile save it to a file, can it (or be can it be extended to) save it to a "stream" or a session variable?

Can PDF be saved?

Hi Author,

It may be that I'm misunderstanding exactly how the html for your page is working. Could you copy and paste into a reply the portion of the HTML that contains the URL to the servlet/PDF? That should inform your first two questions. On the third questions, that's precisely what the session.downloadFile method does--it saves the file to your local hard drive. Also, my apologies for not designating the programming language--it's Interpreted Java, though VBScript would be very similar.

Best,

Todd Wilson
[email protected]

Can PDF be saved?

Hi Todd

Thanks for the reply. I have 2 questions.

1) My ~@PDF_URL@~ refers to a redirect servlet. I am trying to make this work but so far no joy, any suggestions?

2) I have a scraping session that ends up with the pdf in the response area. I can extract this into a session variable.
A) Will the session variable still be binary or does it revert to text?
B) Is there a way to save this binary data to my local drive?

Suggestion: Could you indicate programming language on code snippets?

Thank you
Author R. Jeter

Can PDF be saved?

Hi Author,

Yes, this can be done. You'll first need to extract the URL for the PDF (whole or part) using an extractor pattern. Be sure to save the value for the extractor pattern token corresponding to the URL in a session variable. Having done this, you'll write a short script to invoke the session.downloadFile method.

For example, let's say you extract the PDF URL using the following extractor pattern:


Download PDF

Once again you'll want to save the value extracted by the "PDF_URL" extractor pattern token in a session variable (double-click the token and select the "Save in session variable" checkbox). Then write a script containing the following:


session.downloadFile( session.getVariable( "PDF_URL" ), "C:\mydir\my_doc.pdf" );

You would then invoke this script after your extractor pattern has matched by adding a script associated with it, and selecting "After pattern is applied" in the "When to Run" column.

Hopefully this is enough to get you going. Feel free to post back if we can help further.

Best regards,

Todd Wilson
[email protected]