screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Scraping a site multiple times - remotely

Hi -

I am running a scrape that must retrieve several pages of data one page at a time. The data on each page includes a link to an article or pdf file that exists on the site being scraped and the user must have access to these. The links often require a session id. How can I keep the session id active? Every time I run a scrape it starts from scratch and a new session id is created. I am using RemoteScrapingSession from Java.

Robin

robind on 11/21/2011 at 6:05 pm

screen-scraper support for licensed users

1 comment

OutOfMemory Error when scraping page with many results

Hi,

I am trying to scrape a file that has a variable no. of records based on the parameter sent to the site. If the no. of records is relatively low, it works fine. If the number of record is around 1000 or more records, the command file stops running. It gives an error in the log file:

First Page: Sending request.
An error occurred while processing the script: Delta Dental CA - Get Input
The error message was: OutOfMemoryError (line 32): Java heap space-- Method Invocation session.scrapeFile
Scraping session "Delta Dental CA" finished.

avivag on 11/21/2011 at 12:35 pm

screen-scraper support for licensed users

Foreign Characters work locally, but not on EC2 server

Several of our scrapes pull from sites with international characters, and when we build them, they work just fine in the workbench, and on the server interface locally. The problem is that when we upload them to our EC2 servers, they just come back with question marks instead of the foreign characters. I've tried numerous settings and variations on the server's screen-scraper instance, including DefaultCharSet, DefaultFont, etc, but to no avail.

chrishathaway on 11/11/2011 at 5:41 pm

screen-scraper support for licensed users

3rd party calendar

Is screen-scraper capable of running a 3rd party control that renders directly to the browser window?

I'm scraping this page: http://www.aptnewyork.com/04B100424

On the page if you click Availability to access the calendar it will load a 3rd party (Yahoo) calendar. It looks it's writing directly to the browser window using:

YAHOO.example.calendar.cal1.render();

The full script looks like this:

YAHOO.namespace("example.calendar");
YAHOO.example.calendar.init = function() {

exdap on 10/31/2011 at 3:56 pm

screen-scraper support for licensed users

Scrape needs to be periodically restarted

I've built a scrape that runs for a while, but usually after about 1000 queries, the site starts giving 404 errors. It's not blocking my IP address, because all I have to do is stop the scrape and restart it where I left off, and it will go for a while again. I've run into this before, but usually I'm able to do some combination of messing with the cookies or referrers to get it to work, but this time, no combination of such is successful. The proxy transactions look the same, so I don't think it's the headers.

chrishathaway on 10/21/2011 at 1:49 pm

screen-scraper support for licensed users

Missing Scrape Files form Scraping Session URGENT

The last couple of days Screen Scraper has been randomly deleting whole rafts of scrape files from my scraping sessions. Every day I see more than have been deleted and have failed to run overnight. Is there a maximum number of scraping files of scraping sessions that can be saved in teh is program? I'm running Enterprise and probably have around 40-50 scrape sessions that get run daily on my server.

This is causing massive issues for me as this is stopping this form owrking as it should be. This has only happened in the past few days and has been runnnig fine for around 6 months.

Webcore on 10/19/2011 at 2:54 am

screen-scraper support for licensed users

allow access for domain

as we know, in sever mode you have to specified allowed IP address, is there any chance to give access for domain?

Radek on 10/03/2011 at 6:14 am

screen-scraper support for licensed users

3 comments

debian, problem with server start command

Hi Guys,

i don't know what happened, everything was working correct till now, when trying to restart server i'm receiving this error:

./server: line 364: /home/sscraper/ScreenScraper/ss01/jre/bin/java: No such file or directory

any idea, why? i tried to replace folder from backup but still the same

cheers,

radek

P.S. guys maybe thats just co-accident but, after last update i have weird errors, about debian i wrote above, in windows i got: unable to load main class. is that happening only on my computer?

Regards

Radek on 09/21/2011 at 7:22 am

screen-scraper support for licensed users

13 comments

Scraping the data from the Page

Hi All,

I have created one session to scrap the propertyRoom Site. I have created 4 scrap files for scraping the details which are as follows.
1] All Categories : URL http://www.propertyroom.com/all-categories.aspx
2] CategorySearch : URL http://www.propertyroom.com/c/bikes_beach-bikes
3] ItemDetails : http://www.propertyroom.com/l/panama-jack-beach-bike/8104884

While creating a scrap file through Proxy session I found that it was calling a URL http://www.propertyroom.com/ajax/ajax.svc/GetClientListings repeatedly.

Clarion on 09/16/2011 at 10:38 am

screen-scraper support for licensed users

Updating Screen Scraper

Hi,

We are trying to update our Enterprise version from version 4.5 to 5.0. We are not able to use the updater script hence doing it manually. There is runonce.script how to run this so that tables get updated.

Regards,

Diptarmaya

diptirmaya on 09/09/2011 at 8:02 am

screen-scraper support for licensed users

1 comment

Search

Community

screen-scraper

User login