screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

execute page javascript before file is scraped

Is there a way to tell screen-scraper to execute all page javascript before scraping the file, much like a browser would?

The problem I have, is that I have a number of sites using the same framework, Taleo, which builds the page from a javascript function called at the end of the page. My instincts tell me that this is purely an anti-scraper countermeasure. I've tried to find an "accessible" or scriptless version of the same page - to no avail.

Trying to get to grips with Soap ands its not working as expected

I have followed the example on the site showing how to call screen scraper from using the DLL, that worked fine and i prooved that I can call the scrapes,
however I wanted to do more i.e. take the log file of the scrape and check for errors to automatically warn me of problems...

So I took the soap example (it warns of the wsdl code not be generated quite right and needing to change [] to [][] but I am not sure how I cast that in a response)...

anyway that warning didnt seem to affect me as I dont want the data set only the log file... now these are the lines of code

Skip Output if File Exists and Looping

Hello Support:

I’m scraping a dynamic webpage, whereby real estate properties are added throughout the day. I would like to output the data of the new properties as they are uploaded throughout the day. The name of the output file will be the address of the property. I don’t want to overwrite the existing file if a property has been has been written already. Since there will be properties uploaded to the webpage through the course of the day I want Screen Scraper to loop throughout the day.

The Sequence is as follows:

Time Stamp in Name of the CSV OutputFile

Hello Support:

I'm trying to include the date in the name of the csv output file. I added this to the top of my script:

import java.util.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;

String getDateTime()
{
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-d HH:mm:ss");
Date date = new Date();
return dateFormat.format(date);
}

Then on the outputfile I wrote the date as follows:

outputFile = "C:/Documents and Settings/Me/Desktop/XXX Search/outwrite( getDateTime()).csv";

screen-scraper and linux-vserver

We're trying to run screen-scraper enterprise 5.5 (latest update) on a linux-vserver guest and we're getting errors (it seems) because the screen-scraper server (or HSQLDB, or both) are trying to bind and connect to 127.0.0.1. Is there a configuration directive to control this? Can I tell it to use the vserver guest accessible interface? We're open to any approach to solve this one.

This page: http://linux-vserver.org/Problematic_Programs talks about several applications that use hardcoded references to localhost as 127.0.0.1 (which is what we think is happening with screen-scraper).

update to the newest version

Hey Guys,

Maybe my question is stupid, but i was searching forum and i couldnt find answer,
when im upgrading gui-less ss using link from website, but not recently, am i getting all changes done to the new update or i have to download and replace files each update?
Not sure if i was clear enough to understand my question :)

Cheers,

Radek

IndexOutOfBoundsException Error

Hi there,

I have a script that writes the recordset to a database doing a for( i = 0; i < dataSet.getNumDataRecords(); i++ ) loop.

Everything was working fine, until I introduced this changes in the script:

1. I store at the start of the database write script(before the for loop) the first datarecord:
CurrentDataRecord = dataSet.getDataRecord( 0 );

2. I store in a session variable one of the values of that first datarecord(a date)

3. I call a javascript script where using that session variable I check for the date difference with today´s date

Recommendation of anonymous proxies source?

Hi,

I am using the manual proxy pool way of anonymizing as shown here:

http://community.screen-scraper.com/anonymization_via_manual_proxy_pools

And I am getting a list of proxies from here:

http://www.textproxylists.com/proxy.php?anonymous

However, when I filter them for 7 second connection timeouts as per the example, I end up with only around 30 usable proxies out of a list of around 900 servers :-S

I am wondering if anybody can suggest me a better source for getting a list of good anonymous proxies.

Many thanks,
boga

Confused about why using Java...

Hi,

I have programmed in other languages before(Visual Basic, PHP, SQL...), but all of them had a lot of libraries of functions so for example you can do anything you want with strings and dates, etc...

If I understand right, with Java if I want to compare two dates and figure out what´s the difference in days between them I have to create a function myself because there isn´t one already created?

I am wondering if I should use Javascript instead for Screen-Scraper scripting. In what case would I want to use one or the other?

Cookie defaults changed? OUCH?

Had a scrape stop working. Changed the cookie dropdown from "According to cookie spec" to "Accept all cookies" and it suddenly began working. It's happened on multiple scrapes hitting multiple sites. Neither of these scrapes changed recently, and it started to happen after our 5 to 5.5 update.

Did the default selection for that change, or did the behavior of "according to cookie spec" change?

I cannot remember what the setting was previously without reverting to an old version of screen-scraper and re-importing an old version of the file.

This impacts a ton of my scrapes. :(