a few feature requests
Hello Screen Scraper,
again thanks for a great product!
I have been using SS for a few weeks now and I'm quite impressed with it, but I would however like to give you some ideas as to where you might improve.
I have listed them in order of importance.
1:
To be able to store HTML in the "last response" tab manually, so we could just copy/paste the HTML source code, and start testing the extractor patterns immediately (without the proxy).
2:
When making the extractor pattern, it would be easier to make it match the correct tags / places if we were able to place regex directly in the pattern:
from this:
a href="~@junk@~/~@SCID@~.aspx"~@junk@~title="~@junk@~">~@SCNAME@~<
I would like it better if one could write something like this:
a href="[/.*]*?(\w+).aspx".*?title=".*?">([a-zA-Z1-0 ]*)<
and then later use the \1 and \2 and store named vars in the dataSet, maybe with a doubleclick like we have now with "store in session" options etc..
That way I wouldn't have to sort the result later on, and less testing/validation in the scripts would have to be done.
3:
A way to further refine extractions after an initial extraction without the use of session.executeScript();
4:
The ability to reload .jar files without a restart of SS would be really nice.
5:
A tool in the extractor to easily convert text to doubles would also be appreciated. For example in Sweden they seperate decimals with : (on prices) and in Denmark we use , and in US they use .
When using the same scraper to target similar pages internationally, where the only difference is the number format (199,- one place, 199:- another, and 199.99 in yet another) it would make the job easier.
That would probably make SS near perfect for me
(until I think of something else anyway) :-)
Best regards
Gustav Palsson
Thanks much, Gustav. That
Thanks much, Gustav. That clarifies it.
feature # 3
I have this kind of HTML structure for menus:
IT
Apple
iMac
Mac mini
Mac Pro
MacBook
MacBook Air
MacBook Pro
Xserve
Apple løsdele
Batterier
Grafikkort
Harddiske
Kabler
Monitorer
Mus & tastatur
Netværk
Opladere
Printertoner - Laser/Blæk
Ram
Reservedele
Tasker
Tilbehør
Tilbehør - server
Udvidet garanti
Video
ect. ect. ect...
I use this script to extract the menus, and I call use
DataSet scdataset = scrapeableFile.extractData( mcdata, "subcat extract" )
to call extractors on subsets of the extractions.
It is not possible to do this with ~@datarecord@~ because I need to sort the categories in the correct order, så my script looks like this:
if( dataSet.getNumDataRecords() > 0 ){
DataRecord r = dataSet.getDataRecord(i);
//looping datarecord
for(int i =0; i
//defensive copy
CompetitorCategoryWriter cw = session.getVariable("cw").clone();
CompetitorCategory c = new CompetitorCategory();
c.setName(r.get("MCID"));
c.setDescription(r.get("MCNAME"));
c.setStoreid(session.getVariable("storeid"));
session.getVariable("cw").add(c);
session.getVariable("cw").write();
//extracting subcats
String mcdata = r.get("DATARECORD");
DataSet scdataset = scrapeableFile.extractData( mcdata, "subcat extract" );
//looping subcats
DataRecord re = scdataset.getDataRecord(l);
for(int l = 0; l
String s = re.get("SCID");
if (s.toLowerCase().indexOf("software") != -1)
continue;
//another defensive copy
CompetitorCategoryWriter cws = session.getVariable("cw").clone();
CompetitorCategory cs = new CompetitorCategory();
cs.setName(re.get("SCID"));
cs.setDescription(re.get("SCNAME"));
cs.setStoreid(session.getVariable("storeid"));
session.getVariable("cw").add(cs);
session.getVariable("cw").write();
String scdata = re.get("DATARECORD")+"
";
DataSet sscdataset = scrapeableFile.extractData( scdata, "subcat extract" );
//looping sub sub cats
DataRecord rec = sscdataset.getDataRecord(k);
for(int k =0; k
String s = rec.get("SSCID");
if (s.toLowerCase().indexOf("software") != -1)
continue;
CompetitorCategoryWriter cwss = session.getVariable("cw").clone();
CompetitorCategory css = new CompetitorCategory();
css.setName(rec.get("SSCID"));
css.setDescription(rec.get("SSCNAME"));
css.setStoreid(session.getVariable("storeid"));
session.getVariable("cw").add(css);
session.getVariable("cw").write();
session.setVariable("SSCLINK",rec.get("SSCLINK"));
session.scrapeFile( "product" );
session.setVariable("cw", cwss);
}
session.setVariable("cw", cws);
}
session.setVariable("cw", cw);
}
}
I know it is a bit clumsy, but I couldn't figure out how to make this in a more elegant way. There luckily wasn't need for recursiveness, or it would be even more messy :-)
What I would really like is to have the option to refine the results (for each row) of the result dataset that comes with an extraction.
Like the current "mapping" feature, a tab for conversion (with regex etc.), and one for refining the resultset would be great.
I will email you a picture of how it might look (can't upload anything but txt's and csv's here)
Best regards
Gustav Palsson
suggestion # 4
Hey Todd,
I'm talking about the .jar files in the ext directory in suggestion # 4.
Often when developing (in the beginning anyway, before I get a reasonable library) I need to change the .jar files that the screenscraper scripts import during scrapes.
At the moment that means that I need to close SS, export the package from eclipse to the ext directory and then reopen SS to continue working.
That being said, it is only a minor annoyance.
It would greatly shorten the time if there was a "reload external files" option in the menu somewhere, but seeing as SS is probably made in java, I don't know how that would interact with SS.
Thanks, that helps. Also,
Thanks, that helps. Also, could you clarify a bit on #3? Again, an example might be helpful.
Todd
Hi Gustav, These are
Hi Gustav,
These are excellent suggestions. I'll add them to our to-do list so that we don't lose track of them. Please also feel free to send along any other thoughts you might have.
Also, could you clarify just a bit what you mean on #4? Perhaps an example would help.
Kind regards,
Todd Wilson