Extracting Several Tables to a CSV File

Dear Screen-Scraper Community,

I've been wrestling with this for days now, it seems kind of basic but I can't seem to figure out a way to make it work.

I want to extract this data to a CSV file:

<table width="100%" border="0" cellpadding="4" cellspacing="0">
<tr>
<td>
<hr size="1" width="100%" noshade="noshade" />
</td>
</tr>

<tr valign="top">
<td><b>Steve Abraham</b><br />
Yellow-Checker Cab Co., Inc.<br />
P.O. Box 25123<br />
 Albuquerque, NM 87125<br />
Reservations Phone Number: <b>505-247-8888</b><br />
 Fax: <b>505-243-7499</b><br />
Email: <a href="mailto:[email protected]"><b>[email protected]</b></a><br />
 <br />
 </td>
</tr>

<tr valign="top">
<td>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td>Fleet Information&nbsp;-&nbsp;</td>
<td>Limousines: <b>2</b><br />
 </td>
</tr>
</table>
</td>
</tr>

<tr>
<td>
<hr size="1" width="100%" noshade="noshade" />
</td>
</tr>

<tr valign="top">
<td><b>John Acierno</b><br />
The Executive Transportation Group<br />
1440 39th St.<br />
 Brooklyn, NY 11218<br />
Reservations Phone Number: <b>718-438-1100</b><br />
 Fax: <b>718-438-2930</b><br />
Email: <a href="mailto:[email protected]"><b>[email protected]</b></a><br />
Website: <a href="http://www.executivecharge.com" target="_blank"><b>www.executivecharge.com</b></a><br />
 <br />
 </td>
</tr>

<tr valign="top">
<td>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td>Fleet Information&nbsp;-&nbsp;</td>
<td>Limousines: <b>1500</b><br />
 </td>
</tr>
</table>
</td>
</tr>

I need a CSV file with the correct headers (Name, Company, Address, etc.) and as you can see the second table has a "website". FYI, there are loads of other tables I need to extract.

So I have my first extractor pattern on which is applied a script and another extractor pattern like so:

<tr>
<td>
<hr size="1" width="100%" noshade="noshade" />
</td>
</tr>

<tr valign="top">
<td><b>~@Name@~</b><br />
~@Company@~<br />
~@Address1@~<br />
 ~@Address2@~<br />
Reservations Phone Number: <b>~@Phone@~</b><br />
 Fax: <b>~@Fax@~</b><br />
Email: <a href="mailto:~@Email@~"><b>~@Email@~</b></a><br />
Website: <a href="~@Website@~" target="_blank"><b>~@Website@~</b></a><br />
 <br />
 <strong>Member Service Description:</strong> ~@Desc@~<br />
 </td>
</tr>

<tr valign="top">
<td>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top">
<td>Fleet Information&nbsp;-&nbsp;</td>
<td>Limousines: <b>~@Num@~</b><br />
 </td>
</tr>
</table>
</td>
</tr>

DataSet companies = scrapeableFile.extractData(dataRecord.get("DATARECORD"), "Pattern");
[...]
dataSet.writeToFile( "C:/extracted_data.csv" );

And I really don't know where to go from there, the file is created properly, but nothing in it. I've tried a lot of things that I am ashamed of posting haha.

Can anyone enlighten me?

Oh and I also tried this http://community.screen-scraper.com/script_repository/Write_to_CSV but it doesn't seem to work either.

Would an example help? Things

Would an example help? Things to note:

  1. The Yelp--start CSV script defines the output CSV
  2. In the Yelp--init script, I have a few random profiles I go to check, and see if they have new reviews
  3. In the script Yelp--check date, if they have new reviews I write them to a CSV, and if not I skip them
  4. You'll find the results in the screen-scraper/output directory.

To set it up:

  1. Copy the code below, and paste it into a text editor
  2. Save the file as "yelp.sss"
  3. Import it to screen-scraper
  4. Run the Yelp scrape

Here's the scrape:

<?xml version="1.0" encoding="ISO-8859-1"?>
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="10" sequence="1" enabled="true"><script><script-text>// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter("output/yelp.csv", true);

// Create Headers Array
String[] header = {"Name", "Date", "Business"};

// Set Headers
writer.setHeader(header);

// Save in session variable for general access
session.setVariable( "WRITER", writer);</script-text><name>Yelp--start CSV</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="10" sequence="2" enabled="true"><script><script-text>import java.util.*;
import java.text.*;

// Set number of days to go back
addDays = 50;

Calendar rightNow = Calendar.getInstance();
rightNow.add(Calendar.DATE, addDays*-1);
Date oldestDesired = rightNow.getTime();

// Output the new date.
session.log("+++Seeking reviews newer than " + oldestDesired);
session.setVariable("OLDEST_DESIRED", oldestDesired);

// Manually setting a list of users to check
String[] peopleToCheck = {
        "http://www.yelp.com/user_details?userid=-h8OOTM2JQBvjnH8mf8i5w",
        "http://www.yelp.com/user_details?userid=k3Oopx0QniRDHGlLA4W2XQ",
        "http://www.yelp.com/user_details?userid=tind8sTPbu_i2jLit5Ro4A",
        "http://surlyjason.yelp.com/"
};

// Request each person
for (i=0; i&lt;peopleToCheck.length; i++)
{
        session.log("Checking person #" + i);
        url = peopleToCheck[i];
        session.setv("URL", url);
        session.scrapeFile("Reviews");
}</script-text><name>Yelp--init</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="20" sequence="3" enabled="true"><script><script-text>//scraping session close script
CsvWriter writer = session.getVariable("WRITER");
writer.close();</script-text><name>CSV close</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>Yelp</owner-name></script-instances><name>Yelp</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>July 12, 2011 09:32:21</date_exported><character_set>ISO-8859-1</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>~#URL#~</URL><last-request></last-request><name>Next page</name><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Next page</owner-name></script-instances></scrapeable-files><scrapeable-files sequence="-1" will-be-invoked-manually="true" tidy-html="dont"><last-scraped-data></last-scraped-data><URL>~#URL#~</URL><BASICAuthenticationUsername></BASICAuthenticationUsername><last-request></last-request><name>Reviews</name><extractor-patterns sequence="3" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;a  href="~@URL@~"~@junk@~&gt;&lt;span&gt;More &amp;raquo;</pattern-text><identifier>Next page</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="true" replace-html-entities="true" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>/user_details_reviews_self[^"]*</regular-expression><identifier>URL</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^&lt;&gt;]*</regular-expression><identifier>junk</identifier></extractor-pattern-tokens><script-instances><script-instances when-to-run="80" sequence="1" enabled="true"><script><script-text>if (session.getv("ITERATE_PAGES"))
{
        session.log("Want more results");
        session.setv("URL", dataRecord.get("URL"));
}
else
{
        session.log("Done with this guy");     
}
</script-text><name>Yelp--iterate pages</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Next page</owner-name></script-instances></extractor-patterns><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;div class="review clearfix"&gt;
~@DATARECORD@~
&gt;Link to this Review&lt;</pattern-text><identifier>Review</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><identifier>DATARECORD</identifier></extractor-pattern-tokens><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>class="smaller"&gt;~@REVIEW_DATE@~&lt;</pattern-text><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>\d{1,2}[-/. ]+\d{1,2}[-/. ]+\d{2,4}</regular-expression><identifier>REVIEW_DATE</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;h4&gt;
~@ws@~&lt;a href="~@LINK@~"&gt;~@BUSINESS@~&lt;</pattern-text><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^"]*</regular-expression><identifier>LINK</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[\n\t\s]*</regular-expression><identifier>ws</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="true" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="3"><regular-expression></regular-expression><identifier>BUSINESS</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><script-instances><script-instances when-to-run="60" sequence="1" enabled="true"><script><script-text>import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;

// Set oldest desired date
oldestDesired = session.getv("OLDEST_DESIRED");

// Parse the newest review date
newestDate = dataSet.get(0, "REVIEW_DATE");
DateFormat df = new SimpleDateFormat("M/d/yyyy");
reviewDate = df.parse(newestDate);

// Formatting line
line = "=";
while (line.length()&lt;90)
        line += "=";

// Compare the dates
if (reviewDate.after(oldestDesired) || reviewDate.equals(oldestDesired))
{
        // Within threshold
        session.log(line);
        session.log("Want this guy's reviews");
        numReviews = dataSet.getNumDataRecords();
        session.log("Found " + numReviews + " reviews");
        for (i=0; i&lt;numReviews; i++)
        {
                        oneReview = dataSet.getDataRecord(i);
                       
                        // Prep the values
                        date = oneReview.get("REVIEW_DATE");
                        date = sutil.reformatDate(date, "M/d/yyyy", "yyyy-MM-dd");
                        business = oneReview.get("BUSINESS");
                        session.log(date + ": " + business);
                       
                        // Concatenate the items to write
                        HashMap hm = new HashMap();
                        hm.put("NAME", session.getv("NAME"));
                        hm.put("DATE", date);
                        hm.put("BUSINESS", business);
                       
                        // Get existing writer
                        writer = session.getv("WRITER");
                       
                        // Write dataRecord to the file (headers already set)
                        writer.write(hm);

                        // Flush record to file (write it now)
                        writer.flush();
        }
        session.log(line);
        session.setv("ITERATE_PAGES", true);
}
else
{
        // Too old
        session.log(line);
        session.log("This guy is inactive");
        session.log(line);
        session.setv("ITERATE_PAGES", false);
}</script-text><name>Yelp--check date</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Review</owner-name></script-instances></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&gt;~@ws@~~@NAME@~'s Profile</pattern-text><identifier>Name</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[\n\t\s]*</regular-expression><identifier>ws</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="true" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="true" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^&lt;&gt;]*</regular-expression><identifier>NAME</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Name</owner-name></script-instances></extractor-patterns><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Reviews</owner-name></script-instances></scrapeable-files></scraping-session>

Attempt to invoke method: getNumDataRecords() on undefined varib

Hey,

So I tried everything to adapt it to my problem, but I'm getting a "Attempt to invoke method: getNumDataRecords() on undefined variable or class name" error.

Do you mind taking a quick look? It seems like I'm really close, but I can't figure out the reason of this problem.

<?xml version="1.0" encoding="UTF-8"?>
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="10" sequence="1" enabled="true"><script><script-text>// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter(&quot;C:/TLPA_Extract.csv&quot;, true);

// Create Headers Array
String[] header = {&quot;Name&quot;, &quot;Company&quot;, &quot;Address1&quot;, &quot;Address2&quot;, &quot;Phone&quot;, &quot;Free Phone&quot;, &quot;Fax&quot;, &quot;Email&quot;, &quot;Website&quot;, &quot;Desc&quot;};

// Set Headers
writer.setHeader(header);

// Save in session variable for general access
session.setVariable( &quot;WRITER&quot;, writer);</script-text><name>TLPA CSV Start</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="20" sequence="2" enabled="true"><script><script-text>import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;

// Set oldest desired date
//oldestDesired = session.getv(&quot;OLDEST_DESIRED&quot;);

/* Parse the newest review date
newestDate = dataSet.get(0, &quot;REVIEW_DATE&quot;);
DateFormat df = new SimpleDateFormat(&quot;M/d/yyyy&quot;);
reviewDate = df.parse(newestDate);*/

// Formatting line
line = &quot;=&quot;;
while (line.length()&lt;90)
        line += &quot;=&quot;;

/* Compare the dates
if (reviewDate.after(oldestDesired) || reviewDate.equals(oldestDesired))
{*/
        // Within threshold
        session.log(line);
        session.log(&quot;Want this guy's reviews&quot;);
        numReviews = dataSet.getNumDataRecords();
        session.log(&quot;Found &quot; + numReviews + &quot; reviews&quot;);
        for (i=0; i&lt;numReviews; i++)
        {
                        oneItem = dataSet.getDataRecord(i);
                       // Prep the values
                        Name = oneItem.get(&quot;Name&quot;);
                        Company = oneItem.get(&quot;Company&quot;);
                        Address1 = oneItem.get(&quot;Address1&quot;);
                        Address2 = oneItem.get(&quot;Address2&quot;);
                        Phone = oneItem.get(&quot;Phone&quot;);
                        freePhone = oneItem.get(&quot;freePhone&quot;);
                        Fax = oneItem.get(&quot;Fax&quot;);
                        Email = oneItem.get(&quot;Email&quot;);                        
                        Website = oneItem.get(&quot;Website&quot;);
                        Desc = oneItem.get(&quot;Desc&quot;);
                       
                       
                        // Concatenate the items to write
                        HashMap hm = new HashMap();
                        hm.put(&quot;Name&quot;, session.getv(&quot;Name&quot;));
                        hm.put(&quot;Company&quot;, Company);
                        hm.put(&quot;Address1&quot;, Address1);
                        hm.put(&quot;Address2&quot;, Address2);
                        hm.put(&quot;Phone&quot;, Phone);
                        hm.put(&quot;freePhone&quot;, freePhone);
                        hm.put(&quot;Fax&quot;, Fax);
                        hm.put(&quot;Email&quot;, Email);
                        hm.put(&quot;Website&quot;, Website);
                        hm.put(&quot;Desc&quot;, Desc);
                       
                        // Get existing writer
                        writer = session.getv(&quot;WRITER&quot;);
                       
                        // Write dataRecord to the file (headers already set)
                        writer.write(hm);

                        // Flush record to file (write it now)
                        writer.flush();
        }
        session.log(line);
        //session.setv(&quot;ITERATE_PAGES&quot;, true);


 
</script-text><name>Check</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="20" sequence="3" enabled="true"><script><script-text>//scraping session close script
CsvWriter writer = session.getVariable(&quot;WRITER&quot;);
writer.close();</script-text><name>CSV close</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>NewTLPA</owner-name></script-instances><name>NewTLPA</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>1</originator_edition><logging_level>1</logging_level><date_exported>juillet 13, 2011 19:47:24</date_exported><character_set>UTF-8</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>http://www.tlpa.org/members/directory.cfm</URL><last-request></last-request><name>Copy of File from New Proxy Session</name><HTTPParameters sequence="2"><key>login_pass</key><type>POST</type><value>Berthome</value></HTTPParameters><HTTPParameters sequence="1"><key>login_user</key><type>POST</type><value>3114</value></HTTPParameters><HTTPParameters sequence="3"><key>submit</key><type>POST</type><value>Login >></value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Copy of File from New Proxy Session</owner-name></script-instances></scrapeable-files><scrapeable-files sequence="2" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>http://www.tlpa.org/members/directoryUSA.cfm</URL><last-request></last-request><name>Copy of File from New Proxy Session1</name><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text></pattern-text><identifier>Untitled Extractor Pattern</identifier><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>
</pattern-text><script-instances/></extractor-patterns><script-instances><script-instances when-to-run="60" sequence="1" enabled="false"><script><script-text>import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;

// Set oldest desired date
//oldestDesired = session.getv(&quot;OLDEST_DESIRED&quot;);

/* Parse the newest review date
newestDate = dataSet.get(0, &quot;REVIEW_DATE&quot;);
DateFormat df = new SimpleDateFormat(&quot;M/d/yyyy&quot;);
reviewDate = df.parse(newestDate);*/

// Formatting line
line = &quot;=&quot;;
while (line.length()&lt;90)
        line += &quot;=&quot;;

/* Compare the dates
if (reviewDate.after(oldestDesired) || reviewDate.equals(oldestDesired))
{*/
        // Within threshold
        session.log(line);
        session.log(&quot;Want this guy's reviews&quot;);
        numReviews = dataSet.getNumDataRecords();
        session.log(&quot;Found &quot; + numReviews + &quot; reviews&quot;);
        for (i=0; i&lt;numReviews; i++)
        {
                        oneItem = dataSet.getDataRecord(i);
                       // Prep the values
                        Name = oneItem.get(&quot;Name&quot;);
                        Company = oneItem.get(&quot;Company&quot;);
                        Address1 = oneItem.get(&quot;Address1&quot;);
                        Address2 = oneItem.get(&quot;Address2&quot;);
                        Phone = oneItem.get(&quot;Phone&quot;);
                        freePhone = oneItem.get(&quot;freePhone&quot;);
                        Fax = oneItem.get(&quot;Fax&quot;);
                        Email = oneItem.get(&quot;Email&quot;);                        
                        Website = oneItem.get(&quot;Website&quot;);
                        Desc = oneItem.get(&quot;Desc&quot;);
                       
                       
                        // Concatenate the items to write
                        HashMap hm = new HashMap();
                        hm.put(&quot;Name&quot;, session.getv(&quot;Name&quot;));
                        hm.put(&quot;Company&quot;, Company);
                        hm.put(&quot;Address1&quot;, Address1);
                        hm.put(&quot;Address2&quot;, Address2);
                        hm.put(&quot;Phone&quot;, Phone);
                        hm.put(&quot;freePhone&quot;, freePhone);
                        hm.put(&quot;Fax&quot;, Fax);
                        hm.put(&quot;Email&quot;, Email);
                        hm.put(&quot;Website&quot;, Website);
                        hm.put(&quot;Desc&quot;, Desc);
                       
                        // Get existing writer
                        writer = session.getv(&quot;WRITER&quot;);
                       
                        // Write dataRecord to the file (headers already set)
                        writer.write(hm);

                        // Flush record to file (write it now)
                        writer.flush();
        }
        session.log(line);
        //session.setv(&quot;ITERATE_PAGES&quot;, true);


 
</script-text><name>Check</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Untitled Extractor Pattern</owner-name></script-instances></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;hr size=&quot;1&quot; width=&quot;100%&quot; noshade=&quot;noshade&quot; />
&lt;/td>
&lt;/tr>

&lt;tr valign=&quot;top&quot;>
~@DATARECORD@~
 &lt;/td>
&lt;/tr>
&lt;/table>
&lt;/td>
&lt;/tr>

&lt;tr>
&lt;td>
</pattern-text><identifier>DataPattern</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="true" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression></regular-expression><identifier>DATARECORD</identifier></extractor-pattern-tokens><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>&lt;td>&lt;b>~@myName@~&lt;/b>&lt;br />
~@Company@~&lt;br />
~@Address1@~&lt;br />
 ~@Address2@~&lt;br />
Reservations Phone Number: &lt;b>~@resPhone@~&lt;/b>&lt;br />
 Fax: &lt;b>~@Fax@~&lt;/b>&lt;br />
Email: &lt;a href=&quot;mailto:~@Email@~&quot;>&lt;b>~@Email@~&lt;/b>&lt;/a>&lt;br />
Website: &lt;a href=&quot;http://~@Website@~&quot; target=&quot;_blank&quot;>&lt;b>~@Website@~&lt;/b>&lt;/a>&lt;br />
 &lt;br />
 &lt;strong>Member Service Description:&lt;/strong> ~@Desc@~&lt;br />
</pattern-text><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="6"><identifier>Fax</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="3"><identifier>Address1</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><identifier>Company</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="11"><identifier>Desc</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><identifier>myName</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="10"><regular-expression>[^&quot;]*</regular-expression><identifier>Website</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="4"><identifier>Address2</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="5"><identifier>resPhone</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="8"><regular-expression>[^&quot;]*</regular-expression><identifier>Email</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><script-instances><script-instances when-to-run="80" sequence="1" enabled="true"><script><script-text>import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;

// Set oldest desired date
//oldestDesired = session.getv(&quot;OLDEST_DESIRED&quot;);

/* Parse the newest review date
newestDate = dataSet.get(0, &quot;REVIEW_DATE&quot;);
DateFormat df = new SimpleDateFormat(&quot;M/d/yyyy&quot;);
reviewDate = df.parse(newestDate);*/

// Formatting line
line = &quot;=&quot;;
while (line.length()&lt;90)
        line += &quot;=&quot;;

/* Compare the dates
if (reviewDate.after(oldestDesired) || reviewDate.equals(oldestDesired))
{*/
        // Within threshold
        session.log(line);
        session.log(&quot;Want this guy's reviews&quot;);
        numReviews = dataSet.getNumDataRecords();
        session.log(&quot;Found &quot; + numReviews + &quot; reviews&quot;);
        for (i=0; i&lt;numReviews; i++)
        {
                        oneItem = dataSet.getDataRecord(i);
                       // Prep the values
                        Name = oneItem.get(&quot;Name&quot;);
                        Company = oneItem.get(&quot;Company&quot;);
                        Address1 = oneItem.get(&quot;Address1&quot;);
                        Address2 = oneItem.get(&quot;Address2&quot;);
                        Phone = oneItem.get(&quot;Phone&quot;);
                        freePhone = oneItem.get(&quot;freePhone&quot;);
                        Fax = oneItem.get(&quot;Fax&quot;);
                        Email = oneItem.get(&quot;Email&quot;);                        
                        Website = oneItem.get(&quot;Website&quot;);
                        Desc = oneItem.get(&quot;Desc&quot;);
                       
                       
                        // Concatenate the items to write
                        HashMap hm = new HashMap();
                        hm.put(&quot;Name&quot;, session.getv(&quot;Name&quot;));
                        hm.put(&quot;Company&quot;, Company);
                        hm.put(&quot;Address1&quot;, Address1);
                        hm.put(&quot;Address2&quot;, Address2);
                        hm.put(&quot;Phone&quot;, Phone);
                        hm.put(&quot;freePhone&quot;, freePhone);
                        hm.put(&quot;Fax&quot;, Fax);
                        hm.put(&quot;Email&quot;, Email);
                        hm.put(&quot;Website&quot;, Website);
                        hm.put(&quot;Desc&quot;, Desc);
                       
                        // Get existing writer
                        writer = session.getv(&quot;WRITER&quot;);
                       
                        // Write dataRecord to the file (headers already set)
                        writer.write(hm);

                        // Flush record to file (write it now)
                        writer.flush();
        }
        session.log(line);
        //session.setv(&quot;ITERATE_PAGES&quot;, true);


 
</script-text><name>Check</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>DataPattern</owner-name></script-instances></extractor-patterns><HTTPParameters sequence="4"><key>City</key><type>POST</type><value></value></HTTPParameters><HTTPParameters sequence="5"><key>State</key><type>POST</type><value></value></HTTPParameters><HTTPParameters sequence="7"><key>SortBy</key><type>POST</type><value>LastName</value></HTTPParameters><HTTPParameters sequence="2"><key>FirstName</key><type>POST</type><value></value></HTTPParameters><HTTPParameters sequence="1"><key>LastName</key><type>POST</type><value></value></HTTPParameters><HTTPParameters sequence="6"><key>limosearch</key><type>POST</type><value>YES</value></HTTPParameters><HTTPParameters sequence="3"><key>Company</key><type>POST</type><value></value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Copy of File from New Proxy Session1</owner-name></script-instances></scrapeable-files></scraping-session>

I don't understand all of that...

Hey Jason,

Thanks for the example, unfortunately it's a little complicated for my current understanding of the software. I managed to use your script and all, it works great, but I tried adapting it to my website and failed miserably. I'll try again tomorrow.

Regards,

Tom