newbie in need of help.

I have been getting on OK with Screen-Scraper until this...

ANGEL WINEMAKING

3627 WEST BROADWAY

VANCOUVER,BRITISH COLUMBIA

Canada

V6R 2B8

604-730-6060

BACCHUS GRAP CONNECTION

3511 HASTINGS STREET E

VANCOUVER,BRITISH COLUMBIA

Canada

V5K 2A8

604-299-4848

BEYOND THE GRAPE

2603 KINGSWAY AVENUE

VANCOUVER,BRITISH COLUMBIA

Canada

V5R 5H4

604-437-7100

GRAPE ESCAPE

902 COMMERCIAL DRIVE

VANCOUVER,BRITISH COLUMBIA

Canada

V5L 3L7

604-254-1200

GRAPEVINES WINEMAKING

1314 SW MARINE DR

VANCOUVER,BRITISH COLUMBIA

Canada

V6P 5Z6

604-261-2739

MOSAIC WINE MAKER

1263 PACIFIC BLVD

VANCOUVER,BRITISH COLUMBIA

Canada

604-602-9463

www.vinosaurs.com

NEIGHBORHOOD WINEMAKERS

1680 DAVIE STREET

VANCOUVER ,BRITISH COLUMBIA

Canada

V6G 1V9

604-683-7777

PURPLE GRAPE WINEMAKER

125-555 WEST 12TH AVE

VANCOUVER,BRITISH COLUMBIA

Canada

V5Z 3X5

604-873-9669

THE WINE CELLAR

1659 RENFREW STREET

VANCOUVER,BRITISH COLUMBIA

Canada

V5K 3X7

604-251-9461

WEST COAST U-BREW

1616 CLARKE DR.

VANCOUVER,BRITISH COLUMBIA

Canada

V5L4Y2

604-875-0600

WINE CASTLE THE

4172 FRASER STREET

VANCOUVER,BRITISH COLUMBIA

Canada

V5V 4E8

604-877-1177

WINEMASTER

4107 MACDONALD STREET

VANCOUVER,BRITISH COLUMBIA

Canada

V6L 2P1

604-731-9463

www.mywinemaster.com

The issue is that I can not figure out a the proper extractor pattern to accurately identify one store's information from the next.

I have tried using a bunch of different extractor patterns but the most successful is this one but it gives me garbage for the first record and skips the last:

~@DATARECORD@~

This one was good too but skipped every other record because SS starts looking for the next record AFTER the last character that identifies the previous record so I could not use the ending as the starting for the next record:


~@DATARECORD@~

I have tried just straight extractor patterns but some of the listing have URLs and EMAILs and other don't and i need to capture those items so I really need to use sub-extractor pattens. Any assistance getting this working would be greatly appreciated.

Thanks, Carl.

newbie in need of help.

ceshelman,

In order to get multiple results from sub-extractors you've got to do a few twists and turns and be running either the professional or enterprise edition.

You'll need to make use of the [url=http://www.screen-scraper.com/support/docs/api_documentation.php#extractData]extractData method[/url]. Here's an examples to follow (it's a bit much to try to explain in words only).

http://community.screen-scraper.com/script_repository/manual-extraction-example

I hope this helps. Sorry about needing to upgrade if you do.

-Scott

close but not quite there

Thanks for the reply. This is close but not quite there. SS does not appear to run sub-extraction pattens on a datarecord more than once even if there are multiple instances of the information in the datarecord. So all that is returned is a single row with the first instance if the sub-extractor patten data that SS comes across.

Thanks, Carl.

newbie in need of help.

Carl,

You've got the right idea you just need to expand out a bit and not overlook those nice consistent
tags they're using, too. Here's what I'd recommend for the main extractor pattern text:

<td colspan="2" class="textviolet"><br />
<b~@DATARECORD@~<br />
 </td>
</tr>
</table>

DATARECORD retuns all of the store details but now they'll be one long string stripped of hard returns and tabs. Looks like this:

>ANGEL WINEMAKING</b> <br />3627 WEST BROADWAY<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V6R 2B8<br />604-730-6060<br /> <br /><b>BACCHUS GRAP CONNECTION</b> <br />3511 HASTINGS STREET E<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5K 2A8<br />604-299-4848<br /> <br /><b>BEYOND THE GRAPE</b> <br />2603 KINGSWAY AVENUE<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5R 5H4<br />604-437-7100<br /> <br /><b>GRAPE ESCAPE</b> <br />902 COMMERCIAL DRIVE<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5L 3L7<br />604-254-1200<br /> <br /><b>GRAPEVINES WINEMAKING</b> <br />1314 SW MARINE DR<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V6P 5Z6<br />604-261-2739<br /> <br /><b>MOSAIC WINE MAKER</b> <br />1263 PACIFIC BLVD<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br /> <br />604-602-9463<br /><a href="http&#58;//www.vinosaurs.com" class="textviolet" id="underline">www.vinosaurs.com</a> <br /> <br /><b>NEIGHBORHOOD WINEMAKERS</b> <br />1680 DAVIE STREET<br />VANCOUVER ,BRITISH COLUMBIA<br />Canada<br />V6G 1V9<br />604-683-7777<br /> <br /><b>PURPLE GRAPE WINEMAKER</b> <br />125-555 WEST 12TH AVE<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5Z 3X5<br />604-873-9669<br /> <br /><b>THE WINE CELLAR</b> <br />1659 RENFREW STREET<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5K 3X7<br />604-251-9461<br /> <br /><b>WEST COAST U-BREW</b> <br />1616 CLARKE DR.<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5L4Y2<br />604-875-0600<br /> <br /><b>WINE CASTLE THE</b> <br />4172 FRASER STREET<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V5V 4E8<br />604-877-1177<br /> <br /><b>WINEMASTER</b> <br />4107 MACDONALD STREET<br />VANCOUVER,BRITISH COLUMBIA<br />Canada<br />V6L 2P1<br />604-731-9463<br /><a href="http&#58;//www.mywinemaster.com" class="textviolet" id="underline">www.mywinemaster.com</a>

They've done you a big favor by accounting for the Postal code even when there is none. This means for your sub-extractor patterns you won't be forced to separate out each element but instead you'll be able to use each element's neighbors to help identify which element it is.

They didn't do the same for any URL's, so you're going to need to handle it by itself.

Here's what I suggest for your sub-extractor patterns:

[b]First[/b]

>~@STORE@~</b> <br />~@ADDRESS_ONE@~<br />~@CITY@~,~@PROVINCE@~<br />~@COUNTRY@~<br />~@POSTAL_CODE@~<br />~@PHONE@~<br />

Using regex on some of these will help. I suggest using the following.

POSTAL_CODE:
&#91;A-Za-z0-9 &#93;*

PHONE (available under the drop down)
\&#40;?&#91;\d&#93;&#123;3&#125;&#91;&#41;-\. &#93;&#123;1,2&#125;&#91;\d&#93;&#123;3&#125;&#91;-\. &#93;&#123;1&#125;&#91;\d&#93;&#123;4&#125;

For the rest use the standard non-html:
&#91;^<>&#93;*

[b]Second[/b]

<a href="~@URL@~" class="textviolet" id="underline">~@WEBSITE@~</a>

Also, if you anticipate them having a field for an email address you could do a third sub-extractor where it would look something like this.

<a href="mailto&#58;~@EMAIL@~" class="textviolet" id="underline">~@EMAIL_ADDRESS@~</a>

The nice thing about sub-extractor patterns is that they're not required to match. So, if you never saw an email address it would be ok.

Hope this helps,

Scott