screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Newbie - importing URLs

I have a txt file of thousands of urls that I want to scrape the same data from.

My simple question is, how to I get sss to read from this file,

scrape the data (I can create the scrapable file and extractor patterns),

write the scraped info to a txt file or database,

Then start all over again with the next URL on the list,

repeat until at the bottom of the list.

Regards,
Joseph.

PS - I am using the free version of sss

Scraping GIF digits

I'm trying to scrape prices from a site, but they've replaced the text digits with gif images of the text, and its not always the same file for the same digit. I presume this is to prevent scraping.

Any tips to get around this ?

Cheers

extractData ???

I have a scraping session that is using an extractor to scrape topics in the main pattern, and then created another extractor pattern which will match however many matches there are within the first dataSet. which is OK, however, I need to be able to get the topic along side of each detail variables.

I.e.

The first Main Extractor has the following pattern text and the second Main Extractor being called manually from a script.

1st MAIN:

Not able to start screen scraper as a server

Hi,

I am using windows 2003 and having issues starting screen scrape as a server. I have followed the following tutorial http://community.screen-scraper.com/running_screen-scraper_as_a_server.

When I click on start server from the Start Menu, a dos like screen appears for a millisecond and disappears. Nothing else happens. No icon appears in systems tray.

SS not compatable with Ubuntu karmic

#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00fd969d, pid=14835, tid=3077860208
#
# JRE version: 6.0-b16
# Java VM: OpenJDK Client VM (14.0-b16 mixed mode, sharing linux-x86 )
# Distribution: Ubuntu karmic (development branch), package 6b16-1.6.1-1ubuntu3
# Problematic frame:
# V [libjvm.so+0x1fb69d]
#
# An error report file with more information is saved as:
# /hs_err_pid14835.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:

SS not compatable with Ubuntu karmic

#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00fd969d, pid=14835, tid=3077860208
#
# JRE version: 6.0-b16
# Java VM: OpenJDK Client VM (14.0-b16 mixed mode, sharing linux-x86 )
# Distribution: Ubuntu karmic (development branch), package 6b16-1.6.1-1ubuntu3
# Problematic frame:
# V [libjvm.so+0x1fb69d]
#
# An error report file with more information is saved as:
# /hs_err_pid14835.log
#
# If you would like to submit a bug report, please include

Iterations, variables, and my lack of programming skills

Hi all,

Firstly, thanks for reading. Just looking through over the last couple of weeks it seems like a really well 'cared for' forum.

My problem is that my programming background isn't good enough to put the logic into practice. Essentially I'm trying to extract horse racing information where each meeting has multiple races, and each race has multiple runners.

To illustrate what I'm trying to do, I've put it into a code box as follows:

Extractor Pattern Returns Multiple Values - How do I get each value

Here is my extractor pattern:

; >Note:

;

;
;

~@NOTE@~

Extractor Pattern Returns Multiple Values - How do I get each value

Here is my extractor pattern:

; >Note:

;

;

;

~@NOTE@~

However in the scrapped file >Note: occurs multiple times and there is nothing distinguishable for each occurrence.

When I "apply pattern to last scraped data", it displays all the found values - Sequence 0 and the information --- Sequence 1 and the information.

My question is how do I to out.write each of sequence values?

Is it possible to save the entire HTTP response as a file?

Is it possible to save the entire HTTP response returned by a scrape as a file? I don't see any session methods in the API that does this, but I could be overlooking something. Alternatively, is there some extraction pattern that would grab the whole response?