screen-scraper support for licensed users
Scraping a site multiple times - remotely
Hi -
I am running a scrape that must retrieve several pages of data one page at a time. The data on each page includes a link to an article or pdf file that exists on the site being scraped and the user must have access to these. The links often require a session id. How can I keep the session id active? Every time I run a scrape it starts from scratch and a new session id is created. I am using RemoteScrapingSession from Java.
Robin
OutOfMemory Error when scraping page with many results
Hi,
I am trying to scrape a file that has a variable no. of records based on the parameter sent to the site. If the no. of records is relatively low, it works fine. If the number of record is around 1000 or more records, the command file stops running. It gives an error in the log file:
First Page: Sending request.
An error occurred while processing the script: Delta Dental CA - Get Input
The error message was: OutOfMemoryError (line 32): Java heap space-- Method Invocation session.scrapeFile
Scraping session "Delta Dental CA" finished.
Foreign Characters work locally, but not on EC2 server
Several of our scrapes pull from sites with international characters, and when we build them, they work just fine in the workbench, and on the server interface locally. The problem is that when we upload them to our EC2 servers, they just come back with question marks instead of the foreign characters. I've tried numerous settings and variations on the server's screen-scraper instance, including DefaultCharSet, DefaultFont, etc, but to no avail.
3rd party calendar
Is screen-scraper capable of running a 3rd party control that renders directly to the browser window?
I'm scraping this page: http://www.aptnewyork.com/04B100424
On the page if you click Availability to access the calendar it will load a 3rd party (Yahoo) calendar. It looks it's writing directly to the browser window using:
YAHOO.example.calendar.cal1.render();
The full script looks like this:
YAHOO.namespace("example.calendar");
YAHOO.example.calendar.init = function() {
Scrape needs to be periodically restarted
I've built a scrape that runs for a while, but usually after about 1000 queries, the site starts giving 404 errors. It's not blocking my IP address, because all I have to do is stop the scrape and restart it where I left off, and it will go for a while again. I've run into this before, but usually I'm able to do some combination of messing with the cookies or referrers to get it to work, but this time, no combination of such is successful. The proxy transactions look the same, so I don't think it's the headers.
Missing Scrape Files form Scraping Session **URGENT**
The last couple of days Screen Scraper has been randomly deleting whole rafts of scrape files from my scraping sessions. Every day I see more than have been deleted and have failed to run overnight. Is there a maximum number of scraping files of scraping sessions that can be saved in teh is program? I'm running Enterprise and probably have around 40-50 scrape sessions that get run daily on my server.
This is causing massive issues for me as this is stopping this form owrking as it should be. This has only happened in the past few days and has been runnnig fine for around 6 months.
allow access for domain
as we know, in sever mode you have to specified allowed IP address, is there any chance to give access for domain?
debian, problem with server start command
Hi Guys,
i don't know what happened, everything was working correct till now, when trying to restart server i'm receiving this error:
./server: line 364: /home/sscraper/ScreenScraper/ss01/jre/bin/java: No such file or directory
any idea, why? i tried to replace folder from backup but still the same
cheers,
radek
P.S. guys maybe thats just co-accident but, after last update i have weird errors, about debian i wrote above, in windows i got: unable to load main class. is that happening only on my computer?
Regards
Scraping the data from the Page
Hi All,
I have created one session to scrap the propertyRoom Site. I have created 4 scrap files for scraping the details which are as follows.
1] All Categories : URL http://www.propertyroom.com/all-categories.aspx
2] CategorySearch : URL http://www.propertyroom.com/c/bikes_beach-bikes
3] ItemDetails : http://www.propertyroom.com/l/panama-jack-beach-bike/8104884
While creating a scrap file through Proxy session I found that it was calling a URL http://www.propertyroom.com/ajax/ajax.svc/GetClientListings repeatedly.
Updating Screen Scraper
Hi,
We are trying to update our Enterprise version from version 4.5 to 5.0. We are not able to use the updater script hence doing it manually. There is runonce.script how to run this so that tables get updated.
Regards,
Diptarmaya