screen-scraper support for licensed users
Special char problem between SS versions
I am working with sites in the Portuguese language which contains a number of funny chars. This is a source of many problems and the only solution I've come up with which work ok is to uncheck the "Tidy HTML after scraping". This way I get most of the chars extracted the way I want them. I use the char setting from the site I am scraping which is usually ISO-8859-1 in both the scrape session and as a general setting in "settings". That one work better than the UTF-8 surprisingly.
An error occurred while preparing to issue the HTTP request: null
Guess I'm in for the weekly Null award at Screen-scraper...
I'm working on a new session and I ran into a suspicions error that I just can't seem to get around. I'm sure that the solution is right there in my face but I cannot see it.
Extracting data from current URL
Scrapers,
I've been trying to find a way to scrape data from the current URL in process such as http://www.thesite.com/~@GET_THIS_DATA@~/detailspage/detail.html
Is there a way?
/Johan
Replacement get error: java.lang.NullPointerException BSF info: null at line: 0 column: columnNo
Fellow scrapers,
In order to get one common format in my DB I try to replace sites different expressions for one by using:
trueListingtype = session.getVariable("LISTING_TYPE");
trueListingtype = trueListingtype.replaceAll("WrongExpression1", "RightExpression1");
session.setVariable("LISTING_TYPE", trueListingtype);
trueListingtype2 = session.getVariable("LISTING_TYPE");
trueListingtype2 = trueListingtype2.replaceAll("WrongExpression2", "RightExpression2");
session.setVariable("LISTING_TYPE", trueListingtype2);
Bug Importing Scripts v4.5
I exported 3 scraping sessions from Pro 4.5 running on Windows XP to be imported into Basic 4.5 running on Ubuntu server (no GUI). When I place the scraping sessions and scripts into the import directory and run one of the scraping sessions none of my scripts run.
Error while invoking screen-scraper via PHP
1. We tried to invoke the scrapes via PHP and it throws the following error
Error: Scraping session was either invalid or has not been set.
We are using the following code to invoke the scrapes via PHP
require_once('remote_scraping_session.php');
$return1=$object = new RemoteScrapingSession;
$return2=$object->initialize("scrapping session name");
$return3=$object->scrape();
if( $object->isError() )
{
echo "Error: " . $object->getErrorMessage() . "
";
die;
}
Write to log file from Java library?
I'm calling an external java library from one of my scripts, and I would like to do a little debugging with it. Is it possible to write to my current session's log file from that library? Maybe if I could pass the session object to the library? Any help is appreciated.
dynamic proxy change
I know you can designate multiple proxies with a text file in the SS directory but I would imagine this file is read when SS starts and doesn't refresh while it's running. Is there a method (probably undocumented) for setting the external proxy? This would be really handy because then I could switch the proxy on the fly. There a couple of scenarios where you might not know the proxy IP until after a session has started, i.e. scanning for new proxies, spawning new ones yourself on EC2 etc...
Saving the URL Redirect in a CVS File
How would I save a redirect url to my cvs file? For example, the link to the Agent's website via Trulia is
Embedded Parameters In URL
I need to iterate through a number of web pages with the following URL format:
http://www.company.com/info/0,,12345~9876,00.html
with the value 9876 changing for each new page.
Please advise?
matekus