Newby Shopping tutorial cannot reproduce
I'm an experienced developer, but not Java, and I am testing the basic edition before purchasing. I've been working my way through the shopping tutorial as it is very similar to a requirement. However I am stumped at the Extractor Pattern test
http://community.screen-scraper.com/tutorials/tutorial_2/5_link_extractor_patterns<code>
In the tutorial when you click test pattern you get a list of the search results which is what I'm trying to do. In my scrape when I click test I only have one record in the list not all the records from the search results. obviously I missing something somewhere but cannot figure out.
the only difference I can see between the tutorial and my case is the tutorial example link which includes a class.
<code><td class="productListing-data"> <a href="http://www.screen-scraper.com/shop/index.php?main_page=product_info&products_id=~@PRODUCTID@~">~@PRODUCT_TITLE@~</a> </td><code>
whereas my example link does not include a class on the table cell
<code><td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=~@Identifier@~'>I B D Dis...</a></td><code>
Can anyone provide a pointer how to debug to find the error.
Here is the log.
Starting scraper.
Running scraping session: MCS
Processing scripts before scraping session begins.
Processing script: "MCS Init"
Scraping file: "Search Page"
Search Page: POST data: searchAllInstallers=On&installerJoin=0&installerName=Installer%20Name&clear_form=0&searchLocation=UK%20Postcode&searchResultView=0
Search Page: Requesting URL: <a href="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Scraping" title="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Scraping">http://www.microgenerationcertification.org/consumers/installer-search?o...</a> file: "Search List All"
Search List All: POST data: searchAllInstallers=On&installerJoin=0&installerName=Installer%20Name&clear_form=0&searchLocation=UK%20Postcode&searchResultView=0
Search List All: Requesting URL: <a href="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Search" title="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Search">http://www.microgenerationcertification.org/consumers/installer-search?o...</a> List All: Extracting data for pattern "Installer Details Link"
Search List All: The following data elements were found:
Installer Details Link--DataRecord 0:
Identifier=101579
Storing this value in a session variable.
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "MCS" finished
Thanks
Paul
In the tutorial when you click test pattern you get a list of the search results which is what I'm trying to do. In my scrape when I click test I only have one record in the list not all the records from the search results. obviously I missing something somewhere but cannot figure out.
the only difference I can see between the tutorial and my case is the tutorial example link which includes a class.
<code><td class="productListing-data"> <a href="http://www.screen-scraper.com/shop/index.php?main_page=product_info&products_id=~@PRODUCTID@~">~@PRODUCT_TITLE@~</a> </td><code>
whereas my example link does not include a class on the table cell
<code><td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=~@Identifier@~'>I B D Dis...</a></td><code>
Can anyone provide a pointer how to debug to find the error.
Here is the log.
Starting scraper.
Running scraping session: MCS
Processing scripts before scraping session begins.
Processing script: "MCS Init"
Scraping file: "Search Page"
Search Page: POST data: searchAllInstallers=On&installerJoin=0&installerName=Installer%20Name&clear_form=0&searchLocation=UK%20Postcode&searchResultView=0
Search Page: Requesting URL: <a href="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Scraping" title="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Scraping">http://www.microgenerationcertification.org/consumers/installer-search?o...</a> file: "Search List All"
Search List All: POST data: searchAllInstallers=On&installerJoin=0&installerName=Installer%20Name&clear_form=0&searchLocation=UK%20Postcode&searchResultView=0
Search List All: Requesting URL: <a href="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Search" title="http://www.microgenerationcertification.org/consumers/installer-search?option=com_content&view=article&id=44&Itemid=154<br />
Search">http://www.microgenerationcertification.org/consumers/installer-search?o...</a> List All: Extracting data for pattern "Installer Details Link"
Search List All: The following data elements were found:
Installer Details Link--DataRecord 0:
Identifier=101579
Storing this value in a session variable.
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "MCS" finished
Thanks
Paul
Test Pattern One Record Extracted
Thanks For reply. I have included the id and itemid. See below.
<td><a href='/index.php?option=com_content&view=article&id=~@id@~&Itemid=~@itemid@~&ID=~@Identifier@~'>I B D Dis...</a></td>
However on test pattern same issue only one record extracted. All three patterns are type get url ?
The last response page is as follows so I know I have records in list. I must be missing something really dumb. Its not the tr class value is it ?
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101579'>I B D Dis...</a></td>
<!--<td>Unit 11 Enterprise Park,...</td>-->
<td>-</td>
<td>COR/183</td>
<td>01202 825682</td>
<td>08/10/2010</td>
</tr>
<tr class="mcsTableRow1">
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=104938'>ESI Scotland Ltd</a></td>
My extractor pattern would
My extractor pattern would look like this:
~@DATARECORD@~
</tr>
The number token RegEx would be the "No double quotes" choice, and the DATARECORD would be ".*?"
You can then get the data for each row with sub-extrators.
I should have pasted larger snippet
Thanks for reply. Not sure that will work as values for class alternate they are not unique. When i highlight html after class
<tr class="mcsTableRow0">
. I do not get Datarecord in dialog.<tr class="mcsTableRow0">
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101579'>I B D Dis...</a></td>
<!--<td>Unit 11 Enterprise Park,...</td>-->
<td>-</td>
<td>COR/183</td>
<td>01202 825682</td>
<td>08/10/2010</td>
<td class="mcsResultsMoreInfo"><a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101579">Map</a> | <a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101579">More Info</a> </td>
</tr>
<tr class="mcsTableRow1">
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=104938'>ESI Scotland Ltd</a></td>
<!--<td>17GarscaddenHouse,Dalset...</td>-->
<td>-</td>
<td>NIC5330</td>
<td>01412 742100</td>
<td>02/04/2015</td>
<td class="mcsResultsMoreInfo"><a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=104938">Map</a> | <a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=104938">More Info</a> </td>
</tr>
<tr class="mcsTableRow0">
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101581'>Brownbridge Ltd t/a George Br...</a></td>
<!--<td>Millett House,Millett St...</td>-->
<td>-</td>
<td>NAP 16889</td>
<td>01617 649000</td>
<td>04/11/2011</td>
<td class="mcsResultsMoreInfo"><a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101581">Map</a> | <a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=101581">More Info</a> </td>
</tr>
<tr class="mcsTableRow1">
<td><a href='/index.php?option=com_content&view=article&id=45&Itemid=154&ID=103709'>Eco Solar Potential Ltd tradi...</a></td>
<!--<td>Brook Barn,Stapleton,Shr...</td>-->
<td>-</td>
<td>HET201409</td>
<td>01743 718003</td>
<td>10/04/2014</td>
<td class="mcsResultsMoreInfo"><a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=103709">Map</a> | <a href="/index.php?option=com_content&view=article&id=45&Itemid=154&ID=103709">More Info</a> </td>
</tr>
It will work. I do it all the
It will work. I do it all the time. The alternating classes are common to get the striped table.
Getting There
Thanks
The id and itemid in the tag
The id and itemid in the tag would make it match only once. You need to tokenize those too.