screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Scraped document not displaying well

I tried scraping a site that has a link whose file i want to download. it's a .txt file, but by the time i wrote it into my .csv file, it had lost its formatting. i mean it does not display well when opened with excel spreadsheet. I don't know what I am doing wrong or what i should be doing... Any ideas please?

Substring to shrink result to 7000 characters

Ok I have a field I return that often exceeds 8000 characters and thus won't import into MySQL.

In order to capture just the first 7000 characters I tried this:

void shrinklength(String varName)
(
if (session.getVariable(varName) != null)
session.setVariable(varName, session.getVariable(varName).substring(1,7000));
)

shrinklength("variable");

Bu ti get this error:

The error message was: Encountered "shrinklength ( String varName ) (" at line 13, column 6...

when I run the scrpaing session.

Javascript string oddities

I was trying to scrape a page and wondering why content on the page wasn't in the page source. It turns out that the content was encoded as gibberish and decoded on the fly with javascript.

i.e. document.write(rot_decode('gibberish'))

I downloaded the JS file and found the function. Not having used JS before I thought I'll just make a JS script with the function in it and call from my normal intepreted java script. In my intepreted java script I've got the following function:

hava SOAP client

I'm wondering if anyone's got any tips for getting the SOAP client for the screen-scraper server to work in netbeans. I'm not very good with SOAP. It does my head with all the different ways it looks like you can use it.

I can kind of scrape by just using xml rpc but that seems to be missing a lot of the advantages of WDSL.

Difference between Vista and Server 2003

Hi

Has anyone had problems moving from Vista to Windows Server 2003? Some of my extractor patterns from Vista do not match on 2003.

Another question: Using the SS web interface. What should the Memory usage be? When i start the web interface, it is around 3-4 %, but after the first session it is 20-25 %. After the session is done, shouldn't it go back to the 3-4 %?

Hans

Avoid case sensitivity

Hi

I have a problem scraping a page. On some of the pages, the HTML tags are lower case and on other they are upper case. In both cases it is the same letters. My problem is that my pattern will only match one of them. One solution would be to make to patterns, but isn't there another solution?

Hans

Math Question

I wish to save as a session variable the average of two saved session.getVariables for two different set of session.getVariables and then calculate the average of the two set of session.getVariables plus a third session.getVariable.

For example,
session.getVariable("A")
session.getVariable("B")
out.write the average of AB (AB is the average of A and B)

session.getVariable("C")
session.getVariable("D")
out.write the average of CD (CD is the average of C and D)

session.getVariable("E")
out.write the average of A B, C D, and E

I have the need to speed...

When I scrape a particular site the speed slows down over time so that after just 10 minutes the speed is at a snail's pace. Any suggestions? Thanks!!!

I have no clue...

I'm scraping another site that has a similar problem to the last site. I have no clue how to solve this simple problem. The url contains a number that I have no idea how to get. For example, when I try to find the value of 3352 Maple Ave 90000 the url becomes http://www.website.com/home-values-3352-maple-ave-los-angeles-ca-90000-1... but I have no idea how to get the "183917120" portion of the URL.

Screen Scraper Freezes During Routine Save of Session

I am having some big trouble with my Screen-Scraper software. I build a package and the scraping runs just fine. However, when i try to save the session SS freezes. I have to actually kill the job (Task Manager won't do it - only KillProcess works).

It always seems to save the stuff, but this is kind of frustrating.

Any ideas?