Help with scraping white text
Hey all Great little program...
I am currently dealing with the below bit of scraped HTML Manually, but if anyone had an idea to make life easier it would be great. I recieve around 25 results each week for the area I am interested in, and then type these property prices manually into a database against the advertised property (scraped from another website).
Scrape request:
GET /propertydata/vic/BORONIA/index.html HTTP/1.1
Cookie: PASSPORT=c3bbb7fb199ef30a319c8e0ef139002c
Host: realestateview.com.au
Snip of two records:
PScBOakRAvqTWBQ8rml1066sqmO$400,000UBarryIPlantF
PSPDRosellaQAvVtBVe7rmF150sqmf$253,5008BarryNPlantu
I am interested in the text that is not white. I just need the address and price it sold for.
The results I am getting so far:
Oak
Av
WB
8rm
1066sqm
$400,000
Barry
Plant
Rosella
Av
BV
7rm
150sqm
$253,500
Barry
Plant
Thanks, but once again, I am currently typing this in manually once a week for around 20 records.
Shaun
regular expressions
I'd suggest you proxy the http request you send out and see if you can build a unique regular expression to match and break on the white spaces. This might be more helpful if screen-scraper did grouping, but you could group it by saving it as a session variable and then manipulating it in a script. It looks like you are actually replacing white spaces with hard return symbols which is another option I guess. Screen-scraper Enterprise enables value mapping which does a lot of this on the fly.
Just a couple of thoughts.
scraper