a few feature requests

Hello Screen Scraper,
again thanks for a great product!

I have been using SS for a few weeks now and I'm quite impressed with it, but I would however like to give you some ideas as to where you might improve.
I have listed them in order of importance.

1:
To be able to store HTML in the "last response" tab manually, so we could just copy/paste the HTML source code, and start testing the extractor patterns immediately (without the proxy).

2:
When making the extractor pattern, it would be easier to make it match the correct tags / places if we were able to place regex directly in the pattern:

from this:
a href="[email protected]@~/[email protected]@~.aspx"[email protected]@~title="[email protected]@~">[email protected]@~<

I would like it better if one could write something like this:
a href="[/.*]*?(\w+).aspx".*?title=".*?">([a-zA-Z1-0 ]*)<

and then later use the \1 and \2 and store named vars in the dataSet, maybe with a doubleclick like we have now with "store in session" options etc..

That way I wouldn't have to sort the result later on, and less testing/validation in the scripts would have to be done.

3:
A way to further refine extractions after an initial extraction without the use of session.executeScript();

4:
The ability to reload .jar files without a restart of SS would be really nice.

5:
A tool in the extractor to easily convert text to doubles would also be appreciated. For example in Sweden they seperate decimals with : (on prices) and in Denmark we use , and in US they use .
When using the same scraper to target similar pages internationally, where the only difference is the number format (199,- one place, 199:- another, and 199.99 in yet another) it would make the job easier.

That would probably make SS near perfect for me
(until I think of something else anyway) :-)

Best regards
Gustav Palsson

Thanks much, Gustav. That

Thanks much, Gustav. That clarifies it.

feature # 3

I have this kind of HTML structure for menus:

  •  

    IT
    •  

      Apple

    •  

      Apple løsdele
    • ect. ect. ect...

      I use this script to extract the menus, and I call use

      DataSet scdataset = scrapeableFile.extractData( mcdata, "subcat extract" )

      to call extractors on subsets of the extractions.
      It is not possible to do this with [email protected]@~ because I need to sort the categories in the correct order, så my script looks like this:

      if( dataSet.getNumDataRecords() > 0 ){
      //looping datarecord
      for(int i =0; i DataRecord r = dataSet.getDataRecord(i);

      //defensive copy
      CompetitorCategoryWriter cw = session.getVariable("cw").clone();
      CompetitorCategory c = new CompetitorCategory();
      c.setName(r.get("MCID"));
      c.setDescription(r.get("MCNAME"));
      c.setStoreid(session.getVariable("storeid"));
      session.getVariable("cw").add(c);
      session.getVariable("cw").write();

      //extracting subcats
      String mcdata = r.get("DATARECORD");
      DataSet scdataset = scrapeableFile.extractData( mcdata, "subcat extract" );

      //looping subcats
      for(int l = 0; l DataRecord re = scdataset.getDataRecord(l);

      String s = re.get("SCID");

      if (s.toLowerCase().indexOf("software") != -1)
      continue;

      //another defensive copy
      CompetitorCategoryWriter cws = session.getVariable("cw").clone();
      CompetitorCategory cs = new CompetitorCategory();
      cs.setName(re.get("SCID"));
      cs.setDescription(re.get("SCNAME"));
      cs.setStoreid(session.getVariable("storeid"));
      session.getVariable("cw").add(cs);
      session.getVariable("cw").write();

      String scdata = re.get("DATARECORD")+"

";
DataSet sscdataset = scrapeableFile.extractData( scdata, "subcat extract" );

//looping sub sub cats
for(int k =0; k DataRecord rec = sscdataset.getDataRecord(k);

String s = rec.get("SSCID");

if (s.toLowerCase().indexOf("software") != -1)
continue;

CompetitorCategoryWriter cwss = session.getVariable("cw").clone();
CompetitorCategory css = new CompetitorCategory();
css.setName(rec.get("SSCID"));
css.setDescription(rec.get("SSCNAME"));
css.setStoreid(session.getVariable("storeid"));
session.getVariable("cw").add(css);
session.getVariable("cw").write();

session.setVariable("SSCLINK",rec.get("SSCLINK"));
session.scrapeFile( "product" );

session.setVariable("cw", cwss);
}
session.setVariable("cw", cws);
}
session.setVariable("cw", cw);
}
}

I know it is a bit clumsy, but I couldn't figure out how to make this in a more elegant way. There luckily wasn't need for recursiveness, or it would be even more messy :-)

What I would really like is to have the option to refine the results (for each row) of the result dataset that comes with an extraction.
Like the current "mapping" feature, a tab for conversion (with regex etc.), and one for refining the resultset would be great.

I will email you a picture of how it might look (can't upload anything but txt's and csv's here)

Best regards
Gustav Palsson

suggestion # 4

Hey Todd,
I'm talking about the .jar files in the ext directory in suggestion # 4.
Often when developing (in the beginning anyway, before I get a reasonable library) I need to change the .jar files that the screenscraper scripts import during scrapes.
At the moment that means that I need to close SS, export the package from eclipse to the ext directory and then reopen SS to continue working.

That being said, it is only a minor annoyance.

It would greatly shorten the time if there was a "reload external files" option in the menu somewhere, but seeing as SS is probably made in java, I don't know how that would interact with SS.

Thanks, that helps. Also,

Thanks, that helps. Also, could you clarify a bit on #3? Again, an example might be helpful.

Todd

Hi Gustav, These are

Hi Gustav,

These are excellent suggestions. I'll add them to our to-do list so that we don't lose track of them. Please also feel free to send along any other thoughts you might have.

Also, could you clarify just a bit what you mean on #4? Perhaps an example would help.

Kind regards,

Todd Wilson