problem with java interpretion in screenscraper ??
Hi,
I want to scrape websites to search for some given words (criterions). I need the full code of each page, to save the page with the words found marked in some special colour. Therefore I created one extractor pattern to get the full code and I search for the words in a script. I used a normal java.lang.String and its function indexOf(String str, int fromIndex). For the first occurence of a word, that's alright. But then I change fromIndex to the position after that occurence and I run the function again. I still get an index pointing to the first occurence. I have put exactly the same code in a simple java test program and the answer there was what I expected, so a pointer to the second occurence.
I remember an earlier issue where I thought the interpretion of Java by screenscraper looked suspicious.
Am I the only with this problem ?
Kind regards,
Tamara
problem with java interpretion in screenscraper ??
Hi,
My screenscraper code exists of:
- a scraping session that executes a startup script before and an end script after the session
- 1 scrapeable file that stands for a random page of a website, with two extractor patterns, 1 to get the entire content of the page and 1 to get all links on the page. After each pattern match I start a script to handle the data found, using java. This is because the regular expressions for the subextractor pattern became much too complex. I had problems in the script after the extractor pattern for the full content of the page.
I don't use dataSets, I just put all data in session variables and grab the data wherever I need it.
Bye,
Tamara
problem with java interpretion in screenscraper ??
When is the script run, are you using a main extractor with sub-extractors or grabbing the entire page in a dataSet and parsing through that, are you using session variables?
There are indeed some funky things SS does with Java, you are not the only one. :)
problem with java interpretion in screenscraper ??
I made some stupid problem here. But I still suspect that interpreter. Has anyone else experienced similar problems ?
Tamara