OutOfMemory Error when scraping page with many results

Hi,

I am trying to scrape a file that has a variable no. of records based on the parameter sent to the site. If the no. of records is relatively low, it works fine. If the number of record is around 1000 or more records, the command file stops running. It gives an error in the log file:

First Page: Sending request.
An error occurred while processing the script: Delta Dental CA - Get Input
The error message was: OutOfMemoryError (line 32): Java heap space-- Method Invocation session.scrapeFile
Scraping session "Delta Dental CA" finished.

There is one extractor pattern that looks like this:

 

~@ProviderName@~  
~@Address1@~
~@City@~
~@ST@~ ~@Zip@~
~@Phone@~
~@IGNORE@~

~@SpecialtyDesc@~

Practice Limitations
~@IGNORE@~  

~@Office@~

I have tried upping the memory in the settings to 1024 MB and disabling logging. I am not using session variables, and I am writing the data to file as soon as I retrieve it. This occurs if I run the script from the command line. If I run it from within the application itself, it takes a long time to bring up the page with many records, but it successfully outputs all the records to file. My concern with running it all from within the application is that it will take too long to run. Any help in this matter would be greatly appreciated.

Thanks.

Aviva

2 things I would

2 things I would try:

  1. Simplify the extractor pattern: You could do something like
    <h1>~@ProviderName@~</h1>
    ~@DATARECORD@~
    </tr>

    I just made up the HTML to illustrate, but then use sub-extractors for the other parts. I think that may do the trick, but if not
  2. Disable HTML Tidy: the process to clean up the HTML is resource intensive, and if you turn it off you may need to redesign the extractor to deal with the original HTML, but it should help. See http://community.screen-scraper.com/faq#16n1136

Aviva, In addition to trying

Aviva,

In addition to trying the two things that Jason recommends I would also recommend reducing the radius of your searches. This may require that you perform more zip code searches but hopefully you won't encounter pages that are so large they become unwieldy.

If you need it, we have every available zip code for every U.S. state available for download here.

-Scott

Answer to Scott

Thanks for your suggestions. I ended up doing what Scott recommended of using a smaller radius, and that resolved the issue.

Aviva