screen-scraper public support
doubts regarding Memory management
Hi,
I have a Java program that calls many scraping sessions one after the other. My question is, are session variables that get stored in memory during execution of a scraping session present even after that session has completed and the next scraping session has begun? Or are they cleared at the end of a scraping session?
Along the same lines, suppose I issue a session.stopScraping() command in one script, will it stop executing that scraping session alone and move on to the next session, or will it break the whole flow and not invoke any more scraping sessions?
Environment to execute shell script to scrape?
I have an essentially identical pair of SS-Pro running on XP and CentOS 4.3.
On XP a traditional DOS batch file runs a scraping session without fail.
So I installed SS-Pro on the CentOS 4.3 box using the X-windows installer.
Workbench and the modified batch file from DOS work OK so long as you revise the paths to explicitly call the SS/jre/java instead of relying on the system's java (rel 1.5.x).
SS is installed in /data/APPS/ss on the CentOS box.
If I run a terminal session, cd /data/APPS/ss and then execute my shell script, all is well.
screen-scraper stops execution if run for long time
Hi all,
Thanks in advance.
I ve to scrap properties in "UK for sale" from some sites using postcode.
My postcode db has 1.7 million records.
Our php script calls screen-scraper to fetch the data.
But the problem is screen-scrpaer stops execution,when it scraps data for around 10,000 postcodes.
I ve increased php execution time to 10000,input time to 60 and memery size to 500M in my php.ini file
Also i ve changed setBuffersize in remote_scraping_session to 64000.
Can any one help me in solving this issue.
Is this doing any good, or not?
OK, I finally got around to rewriting the VBScript I was using to write data out to .txt files. I used Interpreted Java, instead of VBScript, for the new code.
In perusing through some Java docs, I hit upon an idea that (I hoped) would make writing out to the files faster. I did this because one of the pages I scrape once a day has over 1000 items that are extracted, and it took a good bit of time (and CPU cycles) to write the data out to the .txt file.
So, here's the code I came up with:
Posting data to a web form?
Ok, I got the scraper session that I needed working great... the problem is, it's finding so many spammer domains that I'm having problems keeping up when manually submitting them to SiteAdvisor.com!
So, I wanted to set it up so that I just drop the list of spammer domains into a .txt file, fire up Screen-Scraper, and let it submit them for me (or, more elegantly, have it read the .txt file that Screen-Scraper is saving the discovered spammer domains into, and remove each domain from the list as it submits comments for them).
Require help/suggestions in scraping following sites
Hi,
Is it possible to scrape the following sites.
www.bananarepublic.com
www.gap.com
www.oldnavy.com
Here i need to fetch the store infomation.
The zip parameters required i would be reading and supplying it from a file.
The location which generates the store infomation is
screen-scraper script (2.7.2) hangs on Fedora
Hi,
I have downloaded the 30-day trial of Screen Scraper. I've installed it on an FC3 running under VMware Workstation. Release details are:
kernel-2.4.22-1.2115.nptl
fedora-release-1-3
The install process seemed to complete without a hitch. I then tried to start Screen Scraper using the screen-scraper script from a command line. The script just hangs and if I show the running processes I see the following repeated 50 to 100 times.
POST Parameter extracted before
Hello,
I want to scrape a File using a POST Parameter I extracted in the previous file.
What I did:
1. in "File1" I extract a variable from a Scraped File: ~@VARIABLE1@~
(The Pattern occurs only once)
2. "After each Pattern application" I start a Script
3. The Script - beside defining some constant Variables - starts the next scrapeable file:
session.scrapeFile( "File2" );
4. "File2" contains some POST Parameters (~#VARIABLE1#~ and some constant variables defined in the script)