screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Trouble with scraping JSP and jQuery site

OK, I realize that this might be a dumb question, but here goes...

I'm scraping a site that looks like it uses JSP and jQuery as the UI for a database; when I set up a Proxy Server, I am able to grab the first page of the site (...index.jsp) with associated .jsp files, but subsequent pages captured by the Proxy show (I think) only the data sent back by the database in a ...select.jsp page.

The data sent back in the ...select.jsp file for subsequent queries looks like this:
 

Extractor pattern for varying HTML code

Hi,

I'm struggling to get an extractor pattern that can extract data from a table that can vary between pages.

I have 3 fields, in columns 2, 3 & 5 of the table (rows 06, 09 & 15 in the code) that I wish to extract the data from and have represented these as ~@DATA1@~, ~@DATA2@~ & ~@DATA3@~ in the example below: -

01<tr>
02<td valign="top" width="36">
03<p dir="RTL" align="center">1</p>
04</td>
05<td width="110">
06<p dir="RTL" align="center">~@DATA1@~</p>
07</td>
08<td colspan="2" width="66">
09<p dir="RTL" align="center">~@DATA2@~</p>

Webpages on Local Drive, How to Extract Data?

Hi I want to ask this question before spending a day learning how to use Screenscraper.

I have 9000 webpages from a website that I have saved to my local disk.

I want to extract all the data from them in csv for to put into Excel.

QUESTION:

Can I use Screenscraper to scape these 9000 webpages on my local host?
If so where may I find a tutorial to help me get started?

Thanks

Any way to suppress Basic edition-related method errors?

Hi guys,

I'm currently working on a scrape that requires use of a manual proxy pool, which I've set up according to your clear instructions and the public proxy list provided by hidemyass.com (http://hidemyass.com/proxy-list/).

My scrape is working fine, but my log is getting filled with "The "currentProxyServerIsBad" method is not available in this edition of screen-scraper" errors, which I'm guessing is a result of the fact that I'm using the Basic edition (no thanks to the cheapas...err...financially prudent managers in my firm).

SS script backup & import: unicode characters aren't handled properly?

Hi guys,

I'm not sure if this is the right forum for this but here goes: I have a script that I use to parse through firm addresses for the U.S. and several countries in Europe, some of which (e.g., France, Germany and Sweden) contain words with unicode characters. Although I'm developing this script for 1 scrape, I'd like to implement it elsewhere to provide other scrapes w/the same address-parsing functionality.

Looping a script

I have two questions:

For some reason my script stopped writing data to the csv file after 1565 times. The log was still doing everything it was supposed to do. Do you know why that could be?

I wanted to write a script that would let me do it in batches of 1000 at a time. I found this code in another posting but it does not seem to do anything? What am I doing wrong?

for( int i = 1; i < 1000; i++ )
{
session.scrapeFile( "6 Get Parts Details" );
}

Any help is much appreciated. Thanks, rudy

"Peer Not Authenticated" Error - what to tell IT?

Hi all,

I'm developing a pretty simple scrape at work (SS 5.5) that harvests an application number from a first webpage, processes the number's format slightly to adjust it for a second website, and then loads a page from the second website using the adjusted application number.

However, when I try to access the second website with the adjusted application, I get the following error:

An input/output error occurred while connecting to 'https:// ... blah blah ...'. The message was peer not authenticated.

Access to java jar files

The following line fails

pagenum = Integer.parseInt(pagenum).toString();

with

The error message was: class bsh.EvalError (line 5): Integer .parseInt ( pagenum ) -- Error in method invocation: Static method parseInt( java.lang.Integer ) not found in class'java.lang.Integer'

I thought the standard java classes were available without me doing anything. Or do I need to copy the jar file to a particluar location and/or include an Import statement

Writing output straight to a text file

Hi Everyone.

I am scraping a website thats just a .txt file. So all I want to do is scrape my site with no tidying and take what came in and send it straight to a text file. Is there a way for me to possibly write a script and attach to run after the scrape? If so what javascript do I use? I am just starting to learn javascript. Thanks.

using this - being logged in on standard browser

Hi

Usually Im already logged in at a site when I launch it on my standard firefox browser.

Can you tell screen scraper to be this firefox browser when running a scrape, so I need not provide a login procedure in the skript?

Thanks

Ben