I am trying to get any method of external control working, anybody have suggestions
Scene Setting:-
I am using the latest version of Screen-Scraper Enterprise 5.5.9a (I re-downloaded yesterday and re installed just in case), I am running on windows 2003 server. I am completely up to date with Windows Update, I am configured to run 10 scrapes at a time and I think I am up to date with the JVM as far as I can tell, I am in Visual Studio 2008 environment, MVC 1.0 .Net 3.5. I have tons and tons of experience programming... I have been using and writing scrapes for 9 months now. and have many scrapes that work - providing you run them in the workbench one at a time
Problem:-
However,
I have experienced serious problems in the reliability in the last two weeks with the scheduling of scrapes in that scrapes don't seem to run sometimes I.e. approximately 1 in 5 scrapes silently fails without error or output in any of the log files of any kind, during a previous support call the developer I spoke too he seemed surprised that I was using the web page to schedule scrapes. With this In mind I have been developing my own multithreaded app that I can use externally to call screen scraper. To try and improve the reliability, after 3 days in this...
is what I have tried so far:
SOAP) I have tried the SOAP API, and it get a very weird error which leads me to believe that the WSDL isn't generated properly see this post http://community.screen-scraper.com/node/1858 So it appears that I cant get SOAP to work - this looks like the interface I want to use but I have got nowhere with this so far...
COM/DLL) I have referenced the RemoteScrapingSession DLL in my .net project and using the example code provided from this site have managed to do the scheduling myself HOWEVER. Randomly Scrapes just stop working, again 1 in 5 stops, no error, no output nothing in stdout.log, error.log or any of the log files I cannot trace what is going on, I also get no errors from the calls.... If I then run the scrape again it works... SO...I cant find out what (if anything) I might be doing that is wrong and have no clues to follow...
REST) I tried this http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Comet-LG-All, which does indeed start a scrape, however the return is supposed to contain an ID so that I can peek in the file and see if its finished (as I need to wait until the scrape has finished before my thread starts and reads the file) the result I get back is {"response":{"status":0,"data":{"scrapeable_session_id":-1}}} so I have No ID with which to call the other REST methods... -1 appears to be an error...
STUMPED:
I am now at a loss as to what I can do as I can't seem to use SOAP, COM/DLL or REST.. I was thinking can I use a DOS command line and spawn a process so that I can at least see what its doing and get an idea of when its finished. If anybody can give me the syntax for that I will try that route, as I can wait for the process to complete, i can see what its doing in the dos window and I can automatically pick up the STDOUT, stderr to track any errors
Alternatively if anybody can fix the issues with either SOAP, COM or REST apis I would be grateful, I am again at the point where I am going to have to give up on this product for its poor reliablility and write my own because I cannot seem to make it do what I need.
I hope you (somebody) responds very quickly as this is business critical for us
[email protected]
07512 248909
Re Rest
I don't need to include any logs because the call to the rest interface does two things, it starts the scrape working perfectly which is nice, and returns me a scrapingid of -1 which isnt nice at all - so I can do nothing with it, the bug in the rest is that it does not return a scraping id for me to use.... IF that worked I might be able to try developing the REST side,
Let me have you test this
Let me have you test this quick scrape I whipped up. What you need to do:
Here is the code for the scrape:
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="10" sequence="1" enabled="false"><script><script-text>session.setv("HOST", "cloak");
session.setv("PORT", "10779");</script-text><name>Detect debug</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="10" sequence="2" enabled="true"><script><script-text>import java.util.*;
// If SOAP port is not passed in, assume default
if (session.getv("PORT")==null)
{
session.setv("PORT", "8779");
session.logInfo("Using default port");
}
if (session.getv("HOST")==null)
{
session.setv("HOST", "http://localhost");
session.logInfo("Host not explictly set. Defaulting to localhost");
}
else
{
if (!session.getv("HOST").toLowerCase().startsWith("http://"))
{
host = "http://" + session.getv("HOST");
session.setv("HOST", host);
}
}</script-text><name>Run all init</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>Detect scrapes</owner-name></script-instances><name>Detect scrapes</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>June 22, 2011 08:57:40</date_exported><character_set>ISO-8859-1</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>~#HOST#~:~#PORT#~/ss/rest?action=get_runnable_scraping_sessions</URL><last-request></last-request><name>All scrapes</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>{"Name":"~@NAME@~","ScrapingSessionID":"~@ID@~"}</pattern-text><identifier>Scraping sessions</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^"]*</regular-expression><identifier>NAME</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^"]*</regular-expression><identifier>ID</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Scraping sessions</owner-name></script-instances></extractor-patterns><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>All scrapes</owner-name></script-instances></scrapeable-files></scraping-session>
I'm not aware of any bugs in
I'm not aware of any bugs in the APIs, and I use them every day. However, I do often have Windows 2003 give me problems. When you start the server, can you start it in compatibility mode? Does it make any difference?
If you clear the screen-scraper\log directory, and start a scrape from REST are any logs generated? Could you post them so I can see if there's a clue in there?