Help with scraping white text

Hey all Great little program...

I am currently dealing with the below bit of scraped HTML Manually, but if anyone had an idea to make life easier it would be great. I recieve around 25 results each week for the area I am interested in, and then type these property prices manually into a database against the advertised property (scraped from another website).

Scrape request:

GET /propertydata/vic/BORONIA/index.html HTTP/1.1
Cookie: PASSPORT=c3bbb7fb199ef30a319c8e0ef139002c
Host: realestateview.com.au

Snip of two records:

PScBOakRAvqTWBQ8rml1066sqmO$400,000UBarryIPlantF

PSPDRosellaQAvVtBVe7rmF150sqmf$253,5008BarryNPlantu

I am interested in the text that is not white. I just need the address and price it sold for.

The results I am getting so far:

Oak
Av

WB
8rm
1066sqm
$400,000
Barry
Plant

Rosella
Av

BV
7rm
150sqm
$253,500
Barry
Plant

Thanks, but once again, I am currently typing this in manually once a week for around 20 records.

Shaun

lazyhorse on 03/31/2009 at 4:56 am

screen-scraper public support

regular expressions

I'd suggest you proxy the http request you send out and see if you can build a unique regular expression to match and break on the white spaces. This might be more helpful if screen-scraper did grouping, but you could group it by saving it as a session variable and then manipulating it in a script. It looks like you are actually replacing white spaces with hard return symbols which is another option I guess. Screen-scraper Enterprise enables value mapping which does a lot of this on the fly.

Just a couple of thoughts.

scraper

scraper on 04/03/2009 at 2:47 pm

Search

Community

screen-scraper

User login

Help with scraping white text

regular expressions