screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Special char problem between SS versions

I am working with sites in the Portuguese language which contains a number of funny chars. This is a source of many problems and the only solution I've come up with which work ok is to uncheck the "Tidy HTML after scraping". This way I get most of the chars extracted the way I want them. I use the char setting from the site I am scraping which is usually ISO-8859-1 in both the scrape session and as a general setting in "settings". That one work better than the UTF-8 surprisingly.

An error occurred while preparing to issue the HTTP request: null

Guess I'm in for the weekly Null award at Screen-scraper...

I'm working on a new session and I ran into a suspicions error that I just can't seem to get around. I'm sure that the solution is right there in my face but I cannot see it.

Extracting data from current URL

Scrapers,

I've been trying to find a way to scrape data from the current URL in process such as http://www.thesite.com/~@GET_THIS_DATA@~/detailspage/detail.html

Is there a way?

/Johan

Replacement get error: java.lang.NullPointerException BSF info: null at line: 0 column: columnNo

Fellow scrapers,

In order to get one common format in my DB I try to replace sites different expressions for one by using:

trueListingtype = session.getVariable("LISTING_TYPE");
trueListingtype = trueListingtype.replaceAll("WrongExpression1", "RightExpression1");
session.setVariable("LISTING_TYPE", trueListingtype);

trueListingtype2 = session.getVariable("LISTING_TYPE");
trueListingtype2 = trueListingtype2.replaceAll("WrongExpression2", "RightExpression2");
session.setVariable("LISTING_TYPE", trueListingtype2);

Bug Importing Scripts v4.5

I exported 3 scraping sessions from Pro 4.5 running on Windows XP to be imported into Basic 4.5 running on Ubuntu server (no GUI). When I place the scraping sessions and scripts into the import directory and run one of the scraping sessions none of my scripts run.

Error while invoking screen-scraper via PHP

1. We tried to invoke the scrapes via PHP and it throws the following error
Error: Scraping session was either invalid or has not been set.

We are using the following code to invoke the scrapes via PHP

require_once('remote_scraping_session.php');
$return1=$object = new RemoteScrapingSession;
$return2=$object->initialize("scrapping session name");
$return3=$object->scrape();

if( $object->isError() )
{
echo "Error: " . $object->getErrorMessage() . "
";
die;
}

Write to log file from Java library?

I'm calling an external java library from one of my scripts, and I would like to do a little debugging with it. Is it possible to write to my current session's log file from that library? Maybe if I could pass the session object to the library? Any help is appreciated.

dynamic proxy change

I know you can designate multiple proxies with a text file in the SS directory but I would imagine this file is read when SS starts and doesn't refresh while it's running. Is there a method (probably undocumented) for setting the external proxy? This would be really handy because then I could switch the proxy on the fly. There a couple of scenarios where you might not know the proxy IP until after a session has started, i.e. scanning for new proxies, spawning new ones yourself on EC2 etc...

Saving the URL Redirect in a CVS File

How would I save a redirect url to my cvs file? For example, the link to the Agent's website via Trulia is

Embedded Parameters In URL

I need to iterate through a number of web pages with the following URL format:

http://www.company.com/info/0,,12345~9876,00.html

with the value 9876 changing for each new page.

Please advise?

matekus