Newbie Newbie Needs Help!!! (So new I said it twice)
I'm trying to create an extrator pattern for the following HTML Source:
var _MARKER_OBJ = [{"t":"for_sale","y":"34.24717","x":"-82.97696","a":"2192 BOWMAN HWY NW","s":"GA","c":"Dewy Rose","p":"42,900","id":"1053243090","fl":"1"},{"t":"for_sale","y":"40.736332","x":"-74.06196","a":"10 HURON AVE","s":"NJ","c":"Jersey City","p":"229,000","id":"1060105176","fl":"1"},{"t":"for_sale","y":"40.94167","x":"-72.97018","a":"","s":"NY","c":"Miller Place","p":"329,900","id":"1077925413","fl":"1"},{"t":"for_sale","y":"34.08545","x":"-118.34959","a":"800 N FULLER AVE","s":"CA","c":"Los Angeles","p":"1,149,000","id":"1076660448","fl":"0"},{"t":"for_sale","y":"33.41188","x":"-86.78102","a":"3407 BUCKHEAD LN","s":"AL","c":"Birmingham","p":"539,900","id":"1054231412","fl":"0"},{"t":"for_sale","y":"37.797264","x":"-122.42115","a":"1444 VALLEJO ST","s":"CA","c":"San Francisco","p":"599,000","id":"44449527","fl":"0"},{"t":"for_sale","y":"32.15182","x":"-111.10051","a":"6349 W COPPER LEAF DR","s":"AZ","c":"Tucson","p":"139,900","id":"1077181664","fl":"0"},{"t":"for_sale","y":"39.721577","x":"-104.95477","a":"335 DETROIT ST #103","s":"CO","c":"Denver","p":"450,000","id":"1077053771","fl":"0"},{"t":"for_sale","y":"39.43195","x":"-104.908516","a":"757 INTERNATIONAL ISLE DR","s":"CO","c":"Castle Rock","p":"2,900,000","id":"1044189379","fl":"0"},{"t":"for_sale","y":"36.4663","x":"-94.28018","a":"18 BRENTWOOD DR","s":"AR","c":"Bella Vista","p":"124,900","id":"1044141927","fl":"0"},{"t":"for_sale","y":"32.913284","x":"-117.2252","a":"10753 CALLE MAR DE MARIPOSA","s":"CA","c":"San Diego","p":"739,000","id":"1077169147","fl":"0"},{"t":"for_sale","y":"26.617975","x":"-81.62809","a":"107 E 5TH ST","s":"FL","c":"Lehigh Acres","p":"84,900","id":"1039162065","fl":"0"},{"t":"for_sale","y":"29.100296","x":"-82.22136","a":"6133 SW 84TH LN","s":"FL","c":"Ocala","p":"139,900","id":"1076370463","fl":"0"},{"t":"for_sale","y":"29.118526","x":"-82.0051","a":"454 FAIRWAYS CIR #B104","s":"FL","c":"Ocala","p":"39,000","id":"1076801192","fl":"0"}];
I created a main extractor pattern:
var _MARKER_OBJ = [~@DATA@~];
which seem to work but when I created a sub-extractor pattern to separate the properties:
{~@DATARECORD@~}
it did not separate the properties when I clicked "Apply Pattern to Last Scraped Data" button.
I would also like to create a sub-sub-extractor pattern for the following items:
"a":"~@ADDRESS@~"
"s":"~@STATE@~"
"c":"~@CITY@~"
"p":"~@PRICE@~"
"id":"~@PROPERTYID@~"
Also, I wrote extensive notes from Tutorial 1 and 2 and have read all of most of the Documentation section of your website, however, I have very little programming knowledge and would like to become more knowledgeable; is there any books or websites that you think would be helpful?
Thanks in advance for your help!!!
sub extractors and more
Hi,
Well, I'm glad that you've decided to challenge the most difficult concept of screen-scraper right off the bat. The concept of manual sub extractors is very powerful, but also advanced. Here's what we're going to do.
1st - rename var _MARKER_OBJ = [~@DATA@~]; to [~@SAVED_STRING@~] and save it as a session variable.
2nd - create a script for manual extraction (I'll walk you through the internals)
3rd - after each pattern application call that script for manual extraction
4th - in the script, pull the saved string from session into a string of your own (session.getVariable());
5th - create an extractor pattern to match the {~@DATARECORD@~} and name that extractor pattern something like "Manual Extraction"
6th - in the sub extractor tab created the 4 or 5 sub extractor patterns to pick up id, p, c, a, s and others.
7th - on the advanced tab mark it to be run externally through manual extraction.
8th - back in the script you're going to develop the manual extractor using dataSets & dataRecords like this:
text = session.getVariable("SAVED_STRING");
DataSet myData = scrapeableFile.extractData(text, "Manual extractor");
for (i = 0; i < myData.getNumDataRecords(); i++) {
myDataRecord = myData.getDataRecord(i);
addon=myDataRecord.get("URL_ADDON");
session.setVariable("URL_ADDON", addon);
session.log(addon);
session.scrapeFile("R&B search results");
}
You'll of course want to adjust the above code to fit your circumstance and need to adjust the myDataRecord.get("") to reflect each of the sub extractor patterns you have assigned the dataRecord, but this is the essence of manual extractors. Theoretically, you could do an infinate number of manual extractors on a string. Each layer needs another loop, extractor pattern, and variables.
Hope this helps.
scraper
Lost in Translation
Hi Scraper,
Thanks for replying to my question. I thought I had atleast average intelligence until I tried to follow the 2nd step. Can you breakdown the 2nd step into detail? If you can, pretend that you had to explain it to a 5 year old. Also, if you think it would help you might want to break down the other steps as well. Thanks for your help.
Kind Regards,
Adrian
further info
here is a link that might be helpful. It contains an example manual extractor session.
http://www.screen-scraper.com/support/examples/manual-extraction-example.html
Best of luck
scraper
Lost in Translation (Part II)
Hi Scraper,
Thanks for the manual extractor session example. I reviewed it but I'm sorry to say but it went right over my head. Is there any documentation like Tutorial 2 that details the above referenced manual extractor session?
Kind Regards,
Adrian
other community site resources
Ok,
Here are some additional resources that have already been developed by our team. Carefully read through these and see if they help.
http://community.screen-scraper.com/node/820
http://community.screen-scraper.com/website_varies
http://community.screen-scraper.com/API/extractData
http://community.screen-scraper.com/FAQ/SimilarTables
http://community.screen-scraper.com/script_repository/manual-extraction-example
Many people who have come before you have had this manual extraction problem, and trust me, many people after you will have this trouble too. It just will take some serious time and learning to grasp this subject.
Thanks
scraper