hava SOAP client
I'm wondering if anyone's got any tips for getting the SOAP client for the screen-scraper server to work in netbeans. I'm not very good with SOAP. It does my head with all the different ways it looks like you can use it.
I can kind of scrape by just using xml rpc but that seems to be missing a lot of the advantages of WDSL.
I can create a new web services client in netbeans but when I try to test it get errors along the line of:
Runtime exception; nested exception is: unexpected element name: expected=getScrapingSessionNamesReturn, actual={http://scraper.screenscraper.com}getScrapingSessionNamesReturn
I've tried making a jaxB binding using the WDSL option (I don't really even know what jaxB is so it may not be appropriate for an rpc client?) and I get compile errors.
Compiling file:/C:/dl/NetBeansProjects/ssSOAP/xml-resources/jaxb/anotheJaxB/localhost_8779/axis/services/SOAPInterface.wsdl
[WARNING] Are you trying to compile WSDL? Support for WSDL is experimental. You may enable it by using the -wsdl option.
unknown location
[ERROR] s4s-elt-schema-ns: The namespace of element 'definitions' must be from the schema namespace, 'http://www.w3.org/2001/XMLSchema'.
line 2 of file:/C:/dl/NetBeansProjects/ssSOAP/xml-resources/jaxb/anotheJaxB/localhost_8779/axis/services/SOAPInterface.wsdl
[ERROR] s4s-elt-invalid: Element 'definitions' is not a valid element in a schema document.
line 2 of file:/C:/dl/NetBeansProjects/ssSOAP/xml-resources/jaxb/anotheJaxB/localhost_8779/axis/services/SOAPInterface.wsdl
[ERROR] schema_reference.4: Failed to read schema document 'file:/C:/dl/NetBeansProjects/ssSOAP/xml-resources/jaxb/anotheJaxB/localhost_8779/axis/services/SOAPInterface.wsdl', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not
line 2 of file:/C:/dl/NetBeansProjects/ssSOAP/xml-resources/jaxb/anotheJaxB/localhost_8779/axis/services/SOAPInterface.wsdl
failure in the XJC task. Use the Ant -verbose switch for more details
C:\dl\NetBeansProjects\ssSOAP\nbproject\xml_binding_build.xml:18: unable to parse the schema. Error messages should have been provided
BUILD FAILED (total time: 0 seconds)
Ideally what I'd like to do is have a Java client that has access to all the SOAP methods AND is independent of the scraping server. The method given to build the client on documentation page works, but the name of the server is riddled throughout the source code it generates. Can I just use that source code and replace all the instances of localhost:8779 with a string variable that I can set when I'm invoking the service locator?
My guess would be no because the binding stub uses the locahost string a lot inside a static{ } block.
p.s. if the above is not
p.s. if the above is not possible I could get around it using the java remote scraping session classes IF I could find a way to kill a remote session without having to stop the whole server.
The immediate reason I'm trying to use this is that I'm running multiple sessions that each use proxy pools populated from public proxy servers. For some reason these sessions occasionally hang while it's filtering the pool or repopulating after using up the pool. I can monitor each session by having them write a lastAliveTime to a database and the controlling script will just assume it's dead if it hasn't updated for more than a couple of minutes. It can then launch a replacement session but I want to avoid having numerous hung sessions running on the server because a) it's messy, b) they won't release resources and eventually the server will fall over. I'm able to manage underperforming session by simply sendthem a kill request via the intermediate database but this won't work on hung sessions because they'll never read the request and get around self-terminating.
Session timeout isn't really an option here because firstly it doesn't seem to work very reliably particularly when a session gets stuck in the middle of opening a tcp socket. Also I'd want each session to run for as long as it could before falling over rather than having each session just run for a couple of minutes then stop. Mainly because the overhead in starting up a new session and preparing the proxypool is quite significant (try starting 50 sessions at once, then try starting 50 session staggered over 60 second intervals and watch the cpu usage, you'll see what I mean)...
I've had a look at some of the external access API's and it looks like they just use simple string based commands over TCP sockets. Is this an option if just want to access a couple of functions? I'd need to do two things, access the sessionID when I'm launching it (If I can access it from inside the session I could just send it back with seesion.sendDataToClient()), and a way to send the killSession request.
Hi, Regarding an alternative
Hi,
Regarding an alternative RPC interface, we're actually on the cusp of developing more of a REST-like interface to screen-scraper that should allow you to do what you're describing. It will mirror the current SOAP interface in many respects, and should expand beyond it.
Regarding your difficulty with SOAP, unfortunately, I don't use Netbeans, so I'm not sure what the issue could be there. Part of the reason I don't use an IDE is for this very issue :) They often try to be helpful, but end up interfering. You might try simply editing your code by hand, and compiling using a tool like ant (http://ant.apache.org/).
At this point, it seems like your best two options would be to fight a bit more with the SOAP interface, or wait until we get the REST interface developed out. In fact, we'd love it if you could help us alpha test the REST interface, which likely will be available sometime this month. Please let us know if that's of interest.
Kind regards,
Todd