way to group workbench objects for easy export
for example a new workbench object called a 'project' where every other type of workbench object can be attached to it so if you've got lots of scripts or scrapeable files that wouldn't normally get exported when you export a scraping session you can just export one object (instead of every script)...
The enterprise users would probably also like being able to group several scraping sessions for one click export.
I thought folders might give this functionality but you can't export a folder and you can't put all types of objects into a folder anyway.
I think you're right about
I think you're right about being able to export several things at once like that, however, if you export a scraping session, it will automatically bundle attached scripts with it, so don't need to export them individually unless you want to save them individually.
You can put all the different objects into folders; Java likes to be tricky though, so you have to first select the object, and *then* drag it into the folder icon. I'm not sure why that tree-panel on the left is cranky like that, but it is the case.
It might be interesting if we can allow you to export a folder. That would be rather helpful, actually, in several contexts that I can think of.
"however, if you export a
"however, if you export a scraping session, it will automatically bundle attached scripts with it,"
that is true however if you have scripts that are called with other scripts but not called from a scrapablefile they will not export... when you've got 20-30 scripts attached to a scraping session and some do export while others don't it gets confusing...
I've found a way to get around it for the time being though... just add all the scripts to the scraping session but disable them so they don't run... then they are at least linked and will export...
even more confusing if you reference the same script from multiple scraping sessions... then you can get the nasty situation where you export SS1 then export SS2... then modify the common script in SS2... then for some reason you want to import SS1 again... there is a copy of the script in both export files... so you end up overwriting it with an old version... To avoid this you have to maintain seperate copies of the script for both...
"there is a copy of the
"there is a copy of the script in both export files... so you end up overwriting it with an old version"
This is true, and in those cases, we currently tend to export SS1 separately, so that we can drop it into an import folder after all is said and done, just in case there is a version difference.
Do you have any ideas for how to group the objects more effectively? Perhaps if we allowed an export of a folder? This way the export could try to keep track simply of the objects involved, and not how exactly they are related.
I think there's two seperate
I think there's two seperate issues here... one is the script overwrite problem which is quite dangerous and could be fixed quite easily... the second is the grouping objects issue which would probably be a bit more complex and is more a matter of convenience (though a major convenience)...
I think the script overwrite issue could be solved quite easily by adding two attributes to the script object.
last modified timestamp & last export timestamp. These can act as a quasi version stamp.
Then on an import you just add a process to check that you're not overwriting a newer version and if so send a warning. Perhaps also give an option to save a copy of the overwritten script first, preferably to a special folder with a naming convention something like-
. You could then have a cleanup routine in the workbench startup routine to cleanup these backed up scripts if they are older than x days... a few settings in the workbench preferences should make this behaviour easy for users to customise to their taste... If you wanted to get tricky you could have a trashcan and a backup folder... if the user selects 'save a copy on overwrite' the script is moved to the backups sub-folders, if they don't it goes to trashcan... because there's always the occasional screwup and it's nice to have one last option... trashcan is auto-pruned after x days, backups is kept indefinately...
as for the other problem of logically grouping objects, there's one issue to keep in mind I think, that currently objects are not referenced in a hierarchical manner... i.e. if you have a script "script" in folder "folder" you can session.executeScript("script") not session.executeScript("folder/script")... I think... so if you move to a more hierarchical model you need to make sure you don't break references in existing sessions...
so I'd suggest you need a container object like a folder but not a folder because it needs more attributes than a folder and you still want to be able to use folders... call it a 'Project'...
to ensure you don't break any existing references we have two rules:
1/ all objects must exist inside a project object and would simply have a 'parent project' attribute added to them...
2/ objects within a project are considered to be in one scope and can be reference without an absolute pathname...
so you'd setup the workbench to automatically create a default project to contain all objects unless the user specifically chooses to use other projects... The structure of objects within the project container wouldn't need to change much... you could still share scripts across multiple sessions within the project object (because the overwrite problem is dealt with by the above)... and you can still reference everything inside the project without pathnames for backward compatibility. but if you desperately want to use some script, scrapefile or session from another project you can reference it with session.executeScript("project2/script").... The upgrade process for moving to this new model would be easy and pretty much transparent to users... just move the entire contents of the workbench into the default container object... they will all be in the same scope so will work the existing flat file referencing convention...
then of course the point of all of this is that you can do an export on a project level... no reason why you couldn't leave your existing session and script export processes in place for people that still want to use them...
I like this approach because it doesn't need to change the way existing user operate unless they actively choose to. They can completely ignore the new functionality if they want to in fact may not even notice that the root folder has changed to the default project...
Personally I prefer a completely hierarchical system where projects can contain projects etc... if you did that then you wouldn't need to have seperate folder and project object types but that's probably a bigger fish to fry...