I am trying to get any method of external control working, anybody have suggestions

Scene Setting:-

I am using the latest version of Screen-Scraper Enterprise 5.5.9a (I re-downloaded yesterday and re installed just in case), I am running on windows 2003 server. I am completely up to date with Windows Update, I am configured to run 10 scrapes at a time and I think I am up to date with the JVM as far as I can tell, I am in Visual Studio 2008 environment, MVC 1.0 .Net 3.5. I have tons and tons of experience programming... I have been using and writing scrapes for 9 months now. and have many scrapes that work - providing you run them in the workbench one at a time

Problem:-

However,
I have experienced serious problems in the reliability in the last two weeks with the scheduling of scrapes in that scrapes don't seem to run sometimes I.e. approximately 1 in 5 scrapes silently fails without error or output in any of the log files of any kind, during a previous support call the developer I spoke too he seemed surprised that I was using the web page to schedule scrapes. With this In mind I have been developing my own multithreaded app that I can use externally to call screen scraper. To try and improve the reliability, after 3 days in this...

is what I have tried so far:

SOAP) I have tried the SOAP API, and it get a very weird error which leads me to believe that the WSDL isn't generated properly see this post http://community.screen-scraper.com/node/1858 So it appears that I cant get SOAP to work - this looks like the interface I want to use but I have got nowhere with this so far...

COM/DLL) I have referenced the RemoteScrapingSession DLL in my .net project and using the example code provided from this site have managed to do the scheduling myself HOWEVER. Randomly Scrapes just stop working, again 1 in 5 stops, no error, no output nothing in stdout.log, error.log or any of the log files I cannot trace what is going on, I also get no errors from the calls.... If I then run the scrape again it works... SO...I cant find out what (if anything) I might be doing that is wrong and have no clues to follow...

REST) I tried this http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Comet-LG-All, which does indeed start a scrape, however the return is supposed to contain an ID so that I can peek in the file and see if its finished (as I need to wait until the scrape has finished before my thread starts and reads the file) the result I get back is {"response":{"status":0,"data":{"scrapeable_session_id":-1}}} so I have No ID with which to call the other REST methods... -1 appears to be an error...

STUMPED:
I am now at a loss as to what I can do as I can't seem to use SOAP, COM/DLL or REST.. I was thinking can I use a DOS command line and spawn a process so that I can at least see what its doing and get an idea of when its finished. If anybody can give me the syntax for that I will try that route, as I can wait for the process to complete, i can see what its doing in the dos window and I can automatically pick up the STDOUT, stderr to track any errors

Alternatively if anybody can fix the issues with either SOAP, COM or REST apis I would be grateful, I am again at the point where I am going to have to give up on this product for its poor reliablility and write my own because I cannot seem to make it do what I need.

I hope you (somebody) responds very quickly as this is business critical for us

[email protected]
07512 248909

JulianGuppy on 06/21/2011 at 5:36 am

screen-scraper support for licensed users

Re Rest

I don't need to include any logs because the call to the rest interface does two things, it starts the scrape working perfectly which is nice, and returns me a scrapingid of -1 which isnt nice at all - so I can do nothing with it, the bug in the rest is that it does not return a scraping id for me to use.... IF that worked I might be able to try developing the REST side,

JulianGuppy on 06/21/2011 at 12:33 pm

Let me have you test this

Let me have you test this quick scrape I whipped up. What you need to do:

Copy the text below into a text edtitor, and save the file as 'detect.sss'
Import 'detect.sss' into your screen-scraper.
If your server is using a SOAP Port other than 8779, you need to enable the script 'detect debug' and edit the script to show your port
Start screen-scraper in server mode, and in the web interface run 'Detect'. The scrape will just make a REST request for all the scrape IDs, but running it this way will create a log. If you can send me that log I'll see if there's anything I can see.

Here is the code for the scrape:

<?xml version="1.0" encoding="ISO-8859-1"?>
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="10" sequence="1" enabled="false"><script><script-text>session.setv("HOST", "cloak");
session.setv("PORT", "10779");</script-text><name>Detect debug</name><language>Interpreted Java</language></script></script-instances><script-instances when-to-run="10" sequence="2" enabled="true"><script><script-text>import java.util.*;

// If SOAP port is not passed in, assume default
if (session.getv("PORT")==null)
{
session.setv("PORT", "8779");
session.logInfo("Using default port");
}

if (session.getv("HOST")==null)
{
session.setv("HOST", "http://localhost");
session.logInfo("Host not explictly set. Defaulting to localhost");
}
else
{
if (!session.getv("HOST").toLowerCase().startsWith("http://"))
{
host = "http://" + session.getv("HOST");
session.setv("HOST", host);
}
}</script-text><name>Run all init</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>Detect scrapes</owner-name></script-instances><name>Detect scrapes</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>June 22, 2011 08:57:40</date_exported><character_set>ISO-8859-1</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>~#HOST#~:~#PORT#~/ss/rest?action=get_runnable_scraping_sessions</URL><last-request></last-request><name>All scrapes</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>{"Name":"~@NAME@~","ScrapingSessionID":"~@ID@~"}</pattern-text><identifier>Scraping sessions</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^"]*</regular-expression><identifier>NAME</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="2"><regular-expression>[^"]*</regular-expression><identifier>ID</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Scraping sessions</owner-name></script-instances></extractor-patterns><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>All scrapes</owner-name></script-instances></scrapeable-files></scraping-session>

jason on 06/22/2011 at 9:03 am

I'm not aware of any bugs in

I'm not aware of any bugs in the APIs, and I use them every day. However, I do often have Windows 2003 give me problems. When you start the server, can you start it in compatibility mode? Does it make any difference?

If you clear the screen-scraper\log directory, and start a scrape from REST are any logs generated? Could you post them so I can see if there's a clue in there?

jason on 06/21/2011 at 8:52 am

Search

Community

screen-scraper

User login

I am trying to get any method of external control working, anybody have suggestions

Re Rest

Let me have you test this

I'm not aware of any bugs in