screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Nested error: java.lang.NullPointerException while importing an sss file

Hi,

I'm on version 5.0.21a and noticed that I cannot import sss files any longer because I get a "Nested error: java.lang.NullPointerException" message. This is very bad because now I cannot deploy changed scrapes.

Could you please reply to this as soon as possible? Thank you in advance.

Kind regards,

Edgar

edgar on 10/14/2010 at 9:57 am

screen-scraper support for licensed users

3 comments

Multithreading optimal sessions\threads mix

I have a setup that should allow multiple user simultaneous scrape. So, I was wondering what should I do?
Like install more instances running fewer threads, or install few instances running more threads? Can relate to the number of CPU cores?

ando__ on 10/11/2010 at 10:23 am

screen-scraper support for licensed users

1 comment

Improve scrape speed

Hi,

I setup a scraping session that parses a relatively simple web page and returns like 100 rows of xml content . 20 nodes of 4 elements. say 20 pattern matches. Now I called scrape from C# using a RemoteScrapingSession object. I debugged the code and noticed that running the scrape command on this object takes about 10-12s. The machine is a 4 quad 3ghz. I'm in EU and scraping a US site.

Now, what can I do to improve the scrape time\ retrieve data faster for user?

ando__ on 10/11/2010 at 10:13 am

screen-scraper support for licensed users

1 comment

removing session from GUI-less OS

Hey guys,

one more question about linux, how can i remove session in GUI less OS, is the only way using SOAP?

Radek on 10/08/2010 at 6:58 pm

screen-scraper support for licensed users

1 comment

response content type

Hello,

I have a scraper that is spidering links that are discovered on arbitrary web sites. I'm trying to check for obvious URL's that I don't want to spider off to... e.g. things that end in .doc, .pdf, etc. However, sometimes it is unavoidable, I still hit the random binary file and screen-scrape tries to scrape it.

Is there a way to tell screen-scraper to fail fast if the content type of the response is something other than "text/xxx"?

byoung on 10/06/2010 at 2:24 pm

screen-scraper support for licensed users

5.0.19a bug?

I just updated my Professional license to 5.0.19a and I am having a strange problem. When double click on Extractor Pattern tokens in some instances it highlights part of the token and then random characters to the right of the token instead of bringing up the Editing Token screen.

Am I the only one having this issue?

Andrew

andrewbkillen on 10/05/2010 at 12:51 pm

screen-scraper support for licensed users

2 comments

Enabling javascript in a scrape

The issue faced by us while scraping a website is that when we capture
the sessions through firefox, we are getting the correct and expected response.

However when we replay the captured scrap able file we get a HTML page as an response
stating

System has detected that Javascript is not enabled. Please click here to continue.

How can we enable javascript in screen scraper, just as we enable JS in IE or FF,

diptirmaya on 10/05/2010 at 8:10 am

screen-scraper support for licensed users

Sub-Sub Extractors?

I am having some issues figuring out the best way to do this. I am scraping content that is essentially in a table laid out similar to this.

Circuit City (Level 1)
Plasma (Level 2)
50"(Level 3)
46"(Level 3)
42"(Level 3)

LCD (Level 2)
55"(Level 3)
37"(Level 3)

Best Buy (Level 1)
Plasma (Level 2)
42"(Level 3)
50"(Level 3)
58"(Level 3)

I can get a pattern to match Level 1 and then the first Level 2 under that section but thats it.

andrewbkillen on 09/30/2010 at 3:20 pm

screen-scraper support for licensed users

linux multi instances

Hi Guys,

while I'm trying to figure out multi-threading, I decided to install screen scraper on linux machine,
after good few hours, my screen scraper was installed on Ubuntu 10.04, but...
im totally fresh in linux, and im just wondering how to start more than one screen scraper server, do I have to install screen scraper, in windows i had 10 instances and i was able to configure each server ports. Can someone tell me how to do that in linux

Regards,

Radek

Radek on 09/30/2010 at 11:28 am

screen-scraper support for licensed users

2 comments

Multithreading same scraping session

Hi,

We have problems with multithreading runs of same scrape session.
So, we have a script that creates an xmlWriter and stores it in a session. After this Initialization script, a scrape is run that gahteres data data from a site.
After the scrape is run, another script gets data from the scrapeSession object and writes it to a file using the xmlWriter.

So, what I did was to do a multithread(10 to 20 threads) run of the same Scraping session. Often the result files contained text that indicated info was
written by several threads.

ando__ on 09/30/2010 at 9:04 am

screen-scraper support for licensed users

Search

Community

screen-scraper

User login