screen-scraper support for licensed users
Error/Extractor Logging
I am currently working on scripting some errorlogging into my scraping projects and one thing I am struggling over is logging the misses of the specific extractor patterns. I want a log file to be written listing the names of the extractors and the respective URLs that produced the misses - as well as a statistic evaluating each extractor pattern and how many hits and misses it produced.
Loop all session variables.
Three questions.
1) Is it possible to loop through all current session variables. I wish to loop through them and delete / clear some.
2) Is it possible to alter screen-scraper in a way that it returns an empty string if a variable does not exists instead of a null value?
3) Is it possible to add custom methods to the session scope. E.g. session.myMethod();
Kind Regards,
Nebu
Running Scripts from manually invoked patterns
Hi,
When i invoke a pattern manually (from script) the underlying scripts do not get executed. The pattern itself is executed. To make sure i added the following code to the calling script:
scrapeableFile.extractData( session.getVariable("Countries"), "GetCountries" );
session.logInfo(session.getVariable("CountryName"));
the log reports:
null /* correct, session.CountryName not set yet */
Countries: Extracting data for pattern "GetCountries"
Connecting to Mysql
Hi,
I am trying to connect to a Mysql database on a different server (but in the same subnet). I have the following code:
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;
// Set Variables
host = "*********";
database = "*********";
username = "*********";
password = "*********";
parameters = "autoReconnect=true&useCompression=true";
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
Losing Sessions/ScrapeableFiles on
So apparently I've run into the next problem. Not something that immediately stops me from working but none the less a major annoyance.
Whenever I close the workbench - screen-scraper seems to delete/lose all my scraping sessions and the associated scrapeable files. Opening the workbench again after closing - only the scripts remain. Tried to restore to some of the created database backups but they seem also to be affected. After opening screen-scraper again the scripts are the only things showing up.
SSH connection issues/peer not authenticated
Notes on the various HTTPS issues are posted the blog.
ssl_error_rx_record_too_long
I am currently trying to set up a scrape on a site to automate the download of orderresponses that they host as PDF-files. But I've run into two problems with it.
1. I tried to setup a proxy session via screen-scraper like I have always done, but after starting the proxy and changing to the settings in my respective browser - whenever I navigate to this very site it gives me the error message "ssl_error_rx_record_too_long" (http://imgur.com/qIMR4uQ) and I have absolutely no clue how to fix it. Furthermore it happens with all browsers I tried and it seems to only happen on this machine.
Reject all Cookies not working when called from command line?
Hi
I am using the Professional version of Screen Scraper
I have written a scrape for a site that works perfectly until it arrives at the seventh record. I looked into the issue and the first thing I did was to change the settings in the scrape session to reject all cookies.
This worked perfectly until I called the scrape via a command line (as I do for all my scrapes). The scrape then stopped again at the seventh record. I would upload the scrape here but I can't see how.
I believe my issue is that the command line needs a switch or something to reject cookies?
Startup error encountered
I had a crash recently while screen-scraper was running and so there maybe some corruption.
Each time I restart, I get the below...
java.lang.NullPointerException
at com.screenscraper.Settings.getEdition(Settings.java:1778)
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java:385)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
Distil blocking my scrapes
The website I scrape has added Distil (http://www.distilnetworks.com/) and I now get redirected to a Captcha page. Any idea how to get around this?