SS feedback / suggestions

Hello!

First I want to take a moment to thank you guys for a great product and excellent support. Keep up the great work!

I've used the product now for a bit and wanted to provide some feedback and suggestions:

Contextual highlighting in extractor patterns - I'd like to see SS tokens highlighted so it would be easier to find them within the text.
example: ~@NAME@~ - the ~@ and @~ might be in blue while NAME would be in red.
Also, it might even be nice to give a visual clue that a token was a session variable.
Editing a token in an extractor pattern shouldn't drop me to the last line of the extractor pattern - When editing tokens in an extractor pattern with multiple tokens, each save of a token takes me back to the bottom of the field requiring one to save, scroll up, edit, save, scroll up again to the next token, etc. If the cursor position could be remembered when you save the token that would rock and make editing faster.
Include a sidebar or dropdown list of existing tokens for a scrapable page or extractor pattern - This would be a nice to have. Clicking on a token in the dropdown list or pattern sidebar would position your cursor on that token. Including a visual representation of a session variable would be awesome.
extractor pattern sections display - Display extractor pattern sections horizontally vs vertically - So instead of this:
===================================================
main -> sub-Extractor Patterns -> Advanced
Identifier: NameSequence: 1
------------------------------------------
main -> sub-Extractor Patterns -> Advanced
Identifier: IDSequence: 2
------------------------------------------
main -> sub-Extractor Patterns -> Advanced
Identifier: IDSequence: 2
====================================================
Have this:

===================================================
1. Name2. ID3. Address
m->sub-E->a m->sub-E->a m->sub-E->a
====================================================

It is kind of hard to display with text rather than an image, but hopefully you get the idea where the tabs are across and the sequence number and name are at the top with the main, sub-extractor patterns, and advanced tabs below. I can try to mock something up if it isn't clear. The tabs that include the sequence number and identifier should be able to be drag and dropped into sequence position. It should also be possible to edit the sequence or identifier by double clicking on them.

Using this style would reduce scrolling and make it easier/faster to see what patterns you currently have. This is also another area that I think could benefit from some state color coding - say if the pattern is going to be called from a script.
The Tree View interaction is quirky - It is a little funky in that you'd expect it would work like any Mac/Windows tree view list and you could left click drag and drop an item into/on another. Instead it seems you must left click - (sometimes needing to click a second time) - wait for script to load in window, then drag and drop. You can't - as far as I can tell, drag and drop a folder at all.
Code or Template Repository and script instances - I'm a single user, and rather new to the product, so perhaps what I'm envisioning wouldn't work well for most of your users, but it seems to me that a folder should limit the scope of what can be run to the scripts/sessions/scrapeable files within it (or its subfolders). Perhaps there could be another section above or below the tree view, that would be where you stored your script/session/scrapeable file templates. You could then drag and drop an instance of that template to a folder for use. You could still have a file of the same name then in a different folder and changes to the template could be distributed to like named copies if the user desired. I'd also like to see the folder be able to be exported as a whole.
Checkbox for HTML Tidy on/off default in Options->Settings->General - I thought there was one previously, but now I can only find it on each scrapeable files Advanced tab, it would be nice if you could have a default setting in the main workbench settings window, but override the default on each file if needed.
Trash Token - frequently I'm finding I have to make a trash token in an extractor pattern. I'd like to have a set token keyword like ~@TRASH@~ that would indicate that item was to be matched but not included in the dataset. Or as another option - instead of a keyword, have a checkbox option on the Token window for 'do not return to dataset' and let the user name it what they want.

Also, I saw a reference that the web interface was in the professional edition via:
~#ss pro edition install dir#~\resource\lws\webapps\ROOT\
I don't see that directory in my files. If its not available for the pro version any more will it possibly be later? I would love to have that option.

Ok, I think that was probably enough to dump on you guys at one go! :)

Thanks again!

Shilea

final note: when doing an initial preview of my post the following warning was thrown:

warning: Invalid argument supplied for foreach() in /var/www/html/community.screen-scraper.com/modules/taxonomy/taxonomy.module on line 70.

Shilea on 03/10/2010 at 8:32 pm

screen-scraper suggestions

Sheila, It looks like you

Sheila,

It looks like you likely using version 4.5, but if you were to update to a pre-release you'll see that we already have several of these things.

You can turn off tidy globally in the screen-scraper.properties file by setting TidyHTML to false.

We once had a token that would work like the "TRASH" you suggest, but we found that not seeing it was clumsy as we needed to verify that it matched only what was expected. Now I just have it in the dataSet so I can confirm it isn't greedy. If you want, the "IGNORE" tag still works, but I honestly think you're better off without it.

We'll look at the others and get back ...

jason on 03/11/2010 at 10:21 am

add one vote for trash token

I'd like to add my vote to this. Though I'd prefer it was either a checkbox option in the token dialog or alternately a different token delimeter.
e.g.
~$JUNK$~ instead of ~@JUNK@~

Perhaps adding a method to dataRecord and dataSet to clear trash tokens would be an alternative that allows you access the trash values for testing but an easy way to remove them. Then the test pattern dialog could have a button to clear them out as well.

I currently have my own external class that clears any token that matches JUNK[\d]+ which works quite ok but I'd prefer an easy way to set junk status as an attribute.

as per another suggestion thread I'd also love to see an option to insert regex direcly. Perhaps using an alternate token delimiter. Obviously without a token name you won't be able to access the match but for making patterns more flexible for matching it would be very handy. e.g I've come across quite a few spans that have random numbers of "&nbrsp;" entities (had to misspell that to make the entity visible in the post) around the data I want. Being able to do something like:
~$(&nbrsp;)+$~~@DATA@~~$(&nbrsp;)+ would save a lot of data cleaning code in the scripts.

shadders on 04/21/2010 at 8:07 pm

Thanks for the reply, Jason.

Are the pre-release versions generally stable enough to run in production? I'd be happy to change over if that is the case.

Thanks -

Shilea

Shilea on 03/19/2010 at 10:18 am

Most of the time they are,

Most of the time they are, and on the occasion that we do introduce a problem, we get it fixed ASAP. We generally used the newest versions in production here so we catch anything right away.

jason on 03/22/2010 at 9:28 am

Search

Community

screen-scraper

User login

SS feedback / suggestions

Sheila, It looks like you

add one vote for trash token

Thanks for the reply, Jason.

Most of the time they are,