Questions Regarding evaluation

Hey. Im currently evaluating screen scraper for work (Basically I work in a travel company and we want to use it to scrape details of travel packages on the websites of some of our partners who, alas, havent got any sort of XML or equivilent feeds.

I have a few questions.

I'd like to be able to get around the slightly goofy upload-to-php method of sticking the data in a mysql database. And with that in mind I'm somewhat interested in the JYTHON bindings.

1) Does the JYTHON bindings have access to a mysql library. If not, how would I go about installing these. Is there some place I can drop .jar files to put them in the radar of the java stack in screen-scraper.

2) Are there external python (or perl- bleaaah!) libs that will work on a linux boxen?

What sort of success have folks had running the system on debian.

These are fairly key questions to me being able to recomend going from the demo to purchacing the product.

Also, the webmaster ought put a register link more prominently on the forum. I had to play with the URL line to find it :(

Questions Regarding evaluation

Hi Todd!

I updated the jar with the jython 2.2 beta2 jar file and it seemed to resolve my issue. I was having problems using open() to open a file to read from.

Thanks!!

I'm looking forward to the new Drupal site! Thanks for an excellent product! It's made my month!!

Skot

Questions Regarding evaluation

Hi,

In answer to your questions...

It shouldn't be any problem to replace the jython.jar file we ship with a more current one. Simply drop it in the "lib" folder, overwriting the existing file.

We actually have a Drupal site in the works. It's not ready to be deployed just yet, but keep an eye out for it. I wish I could hand over lots of tips and tricks specific to working with Python in screen-scraper, but, unfortunately, we don't have them. Once we get the Drupal portion of our site rolled out, hopefully others will contribute to that area.

Kind regards,

Todd Wilson

Questions Regarding evaluation

Hey guys,

I'm just wondering about the answers to a couple of shayne's questions. I'm working with jython scripts as i know it better than vbscript or java.

[quote]A couple more questions. How 'complete' is the jython jar-ball. If its missing a few features, am I able to get a more up to date jython jar-ball and replace the one in the lib folder, or is it customised somewhat?
[/quote]

Also...

[quote]edit: by the way , you guys should consider a wiki, as once I nail this thing down, Im happy to provide some python code to it, to demonstrate a few python techniques for manipulating scrapes.
[/quote]

If you guys don't have a wiki could you post tips somewhere in the forums?? I'd really appreciate it as I've been having some issues working with screen-scraper/jython. Thanks!!

skot

Questions Regarding evaluation

I've done that, but it doesnt do anything.

The output you see there is with "tidy html" turned off.

I've gone to the scraper session for the detail page (the one with the problem), and on advanced unchecked "tidy html" , saved it and ran it, and it still scrambles the html. Is there perhaps a cache or something I can trash to get it to do what its told?

Im using Screen scraper pro v.2.7.2 (evaluation) on a debian linux box.

edit: by the way , you guys should consider a wiki, as once I nail this thing down, Im happy to provide some python code to it, to demonstrate a few python techniques for manipulating scrapes.

edit2: I've sent you an email Todd.

Questions Regarding evaluation

Hi,

Fortunately, there's an easy fix for this. For the scrapeable file (not the scraping session) that's "cleaning" up the HTML for you, go under the "Advanced" tab, then un-check the box labeled "Tidy HTML after scraping?" If that box isn't checked it will leave your HTML as is.

Todd

Questions Regarding evaluation

Ok, this damn thing really is a show stopper.

Heres the problem

I'm just trying to lift a table wholesale off the affiliates site. Now, the problem is the table has some html like this;-

(table junk head)                        
                          <tr>
                                <td>
                                MS Poetry</td>
                                <td>70415</td>
                                <td>15 Apr 2007</td>
                                <td>
                                </div>$2393

Questions Regarding evaluation

No probs Todd, as a coder, I do understand its a bit odd supporting stuff ones not 100% bottle on. I had an oposite problem once. I implemented a cpython embedding for the citadel groupware server, and fell into alot of headaches due to my inexperience with C. Still got it to work, but my poor C skills and the dev's of citadels poor python skills meant we never really did get it humming. It was a bit of a comedy-buzz launching boa-constructur RAD tool from inside a running server however. Silly hack.

I did manage to work out (some) of the jython issues.

A couple more questions. How 'complete' is the jython jar-ball. If its missing a few features, am I able to get a more up to date jython jar-ball and replace the one in the lib folder, or is it customised somewhat?

Heres a *really* important question.

How the heck do I stop it trying to "fix" html in scraped code. The "clean html" option doesnt appear to do anything :( , and its pretty much written off the ability to use the product for me, because its producing crazy html with table closures mid table and stuff like that. I try and uncheck it, but the html produced is clearly not what goes into it.

This is a show stopper bug, and unfortunately, and somewhat distressingly I've already sold the boss on the product and have to get the job out of the door this arvo. :(

Questions Regarding evaluation

Hi,

Thanks for the posting. I'm glad to hear that someone's taking a stab at using the Jython interface. We don't get too many asking about that.

In answer to your questions

1. Jython will have access to any Java classes that are in screen-scraper's classpath. To do this, simply drop jar files (e.g., the mySQL driver) into screen-scraper's "lib\ext" directory. It will add them to its classpath when it starts up.
2. On this one you may need to read up on the Jython documentation. I don't know that we've had anyone wanting to use external Python libraries with it, but I'd guess it's possible. Feel free to post if we can help with anything on it. As to Perl, it actually only runs on Windows via the ActiveState library, and I'm unaware of how to interface it with external Perl modules. Though, like Jython, I'm guessing it's possible.
3. One of our developers runs screen-scraper on Ubuntu, and has good success with it. Please let us know of any snags you run into on Debian.
4. Thanks for the tip on the "Register" link. We're just using a phpBB theme, but it obviously isn't the most user friendly.
5. See #2 on this one. Unfortunately, we don't have much Python expertise in-house, though we added the Jython interface for developers such as yourself who are more comfortable with it. If you wouldn't mind posting about your experience some other developers would likely benefit (and we'd probably even include some of it in our documentation).

Kind regards,

Todd Wilson

Questions Regarding evaluation

Aaaand another one. Simple one.

How do I get (to) the usual jython modules. Particularly "string"

'import string' seems to fail badly.