screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Error/Extractor Logging

I am currently working on scripting some errorlogging into my scraping projects and one thing I am struggling over is logging the misses of the specific extractor patterns. I want a log file to be written listing the names of the extractors and the respective URLs that produced the misses - as well as a statistic evaluating each extractor pattern and how many hits and misses it produced.

Loop all session variables.

Three questions.

1) Is it possible to loop through all current session variables. I wish to loop through them and delete / clear some.
2) Is it possible to alter screen-scraper in a way that it returns an empty string if a variable does not exists instead of a null value?
3) Is it possible to add custom methods to the session scope. E.g. session.myMethod();

Kind Regards,
Nebu

Running Scripts from manually invoked patterns

Hi,

When i invoke a pattern manually (from script) the underlying scripts do not get executed. The pattern itself is executed. To make sure i added the following code to the calling script:

session.logInfo(session.getVariable("CountryName"));
scrapeableFile.extractData( session.getVariable("Countries"), "GetCountries" );
session.logInfo(session.getVariable("CountryName"));

the log reports:

Processing script: "BookingCountries"
null  /* correct, session.CountryName not set yet */
Countries: Extracting data for pattern "GetCountries"

Connecting to Mysql

Hi,

I am trying to connect to a Mysql database on a different server (but in the same subnet). I have the following code:

 // Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "*********";
database = "*********";
username = "*********";
password = "*********";
parameters = "autoReconnect=true&useCompression=true";

BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );

Losing Sessions/ScrapeableFiles on

So apparently I've run into the next problem. Not something that immediately stops me from working but none the less a major annoyance.

Whenever I close the workbench - screen-scraper seems to delete/lose all my scraping sessions and the associated scrapeable files. Opening the workbench again after closing - only the scripts remain. Tried to restore to some of the created database backups but they seem also to be affected. After opening screen-scraper again the scripts are the only things showing up.

SSH connection issues/peer not authenticated

Notes on the various HTTPS issues are posted the blog.

ssl_error_rx_record_too_long

I am currently trying to set up a scrape on a site to automate the download of orderresponses that they host as PDF-files. But I've run into two problems with it.

1. I tried to setup a proxy session via screen-scraper like I have always done, but after starting the proxy and changing to the settings in my respective browser - whenever I navigate to this very site it gives me the error message "ssl_error_rx_record_too_long" (http://imgur.com/qIMR4uQ) and I have absolutely no clue how to fix it. Furthermore it happens with all browsers I tried and it seems to only happen on this machine.

Reject all Cookies not working when called from command line?

Hi
I am using the Professional version of Screen Scraper

I have written a scrape for a site that works perfectly until it arrives at the seventh record. I looked into the issue and the first thing I did was to change the settings in the scrape session to reject all cookies.

This worked perfectly until I called the scrape via a command line (as I do for all my scrapes). The scrape then stopped again at the seventh record. I would upload the scrape here but I can't see how.

I believe my issue is that the command line needs a switch or something to reject cookies?

Startup error encountered

I had a crash recently while screen-scraper was running and so there maybe some corruption.

Each time I restart, I get the below...

java.lang.NullPointerException
at com.screenscraper.Settings.getEdition(Settings.java:1778)
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java:385)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)

Distil blocking my scrapes

The website I scrape has added Distil (http://www.distilnetworks.com/) and I now get redirected to a Captcha page. Any idea how to get around this?