screen-scraper support for licensed users
Is it possible to include a script in another script that isn't attached in a scraping session?
I'm building a set of functions into a script and would like to include it in other scripts so I don't replicate code.
Is it possible to do this without building a JAR file to import?
The "Master Script" file would NOT be included within a session's scripts, nor would it be listed in the sessions before or after list of scripts.
What would be the best method to accomplish this?
Excel VBA RemoteScraping Session
I am having difficulty with running a remote scraping session from Excel VBA - generates "Error 429 (ActiveX component can't create object)".
Dim objSession As Screenscraper.RemoteScrapingSession
If objSession Is Nothing Then Set objSession = New Screenscraper.RemoteScrapingSession
Please advise?
matekus
RunnableScrapingSession
Is there a way to know when a RunnableScrapingSession is finished? I try searching everywhere and couldn't find anything on it. The reason for this is I want to be able to spawn new RunnableScrapingSession objects once x amount has been completed. Would there be another way of handling this? I want to run around 50 simultaneous RunnableScrapingSession's but not sure if screen scraper can handle it.
Save button disabled
I have just encountered a problem where the save button and file - save menus have become disabled permanently. Exiting screen-scraper and starting it again have not helped. This is not specific to one template. I am running version 5.5.34a which I upgraded to recently. Any help with this would be greatly appreciated.
Scraping AJAX
I'm trying to figure out how to use screen-scraper to scrape pages using Ajax, and I haven't found anything that's very helpful beyond reference to some methods that should be used (e.g. setRequestEntity, addHTTPHeader, etc.).
One of the sites I'm trying to scrape is www.harvardpilgrim.org, specifically the doctor lookup. When I look at the pages captured in the proxy session, instead of the normal parameters, I see the following type of request:
Scraping using .bat file
Can anyone help me. I am invoking a scraping session via a .bat file. The scraping session has a number of scripts (written in vbScript) that are invoked. The .bat file calls the session, processes the first scrapeable file but then hits an error (see the text below):
An error occurred while processing the script: START_SEARCH
The error message was: UnsatisfiedLinkError loading library:bsfactivescriptengine no bsfactivescriptengine in java.library.path
Processing scripts after scraping session has ended.
Scraping session "Update" finished.
Guide to use threads for scraping - load x (and only x) pages at a time, and write result to db for each thread
old thread with question removed :-)
Instead I thought I would add something to the community.
Finally my threaded project is working correctly! :-)
For everyone else needing to run workerthreads (because eg. the page loads slow but you have lots of processing power to do regex).
I am currently spawning 5 threads at a time, but you could easily go with 10 or more.
I am using public proxies. These need to be set in every thread spawned. I have set them so the same pool is accessible in all threads.
Receiving two different contents on same url
I have come across a cfm site where if you request a page, the server returns different contents for the same url.
I have looked in screen scraper and in charles proxy, and there isn't any post parameters. All that is being pass is a simple get parameter. For example:
www.example.cfm?param=123 it would come up with a page with the contents "abc".
Then if you refresh or try to call it again, it would come up different contents such as "xyz".
This is completely random, meaning when refreshed it could come up as "abc" again or as "xyz".
Complicated Site with Frames Looks Good in Proxy Session but not in Scraping Session
I am working on setting up a scrape of a site that requires an external proxy (which I have set up), brings the user to a license agreement page first and then to the main search page. From there the user can search multiple types of law journals. Using a screen scraper proxy session, I am able to capture the information that I need. The final search results are listed in a frame and I have been able to discern which of the many entries in the Proxy Session Progress tab is the frame that I need. I've looked at the response in the proxy session and the data that I want is there.
Ability to change URL dynamically
Is it possible to modify the current url dynamically? I see that I can get the value of the current url, but I don't see a way to set it or modify it. I need to add a token at runtime in some cases before a scrape is run. The token is part of the url, not a parameter. Is there anything that I can do?
Thanks.
UPDATE: Ah. I see that I can only get the URL after the file has been scraped. Is there a way to intercept and modify the current URL *before* scraping?