screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Forum search "Page cannot be displayed" after <

In this forum, after I read a post I've picked out from the search hits lits, I use the button on IE6 to return to the search list. But when I do that I get the "Page cannot be displayed" error page. Another or two gets me to the search input page, but the terms are now blank.

This is on Windows 2003 Server. On an XP SP2 system I get the "Warning: page has expired" screen, and then have to "refresh" the page (which comes with its own Windows are you sure nag.>

SS 3.0 Strip HTML inserting <cr><lf> in output t

Hi:

The "Strip HTML" option evidently is doing a bit of interpreting as well. The output stream includes a number of cr-lf pairs.

Evidently the option is interpreting

 <p>- </p>

,     <tr>
    </tr>
and possibly other tokens as line and inserting the hard coded cr-lf pair to account for output spacing.

SS 3.0 vs 2.7.2 grid display of dataset vs scrape result

Hi:

Having just upgraded to 3.0 from 2.7.2, I can't seem to find a setting that reverts to the older grid display of the search/match results of a scrape.

In 2.7.x, "Apply Pattern to Last Scraped Data" displayed a two-column name-value grid with the session variable name in column 1 and the value in column.

SS 3.0 Stop Scraping Session not working

Hi:

I'm trying to stop / kill / abort a scraping session that's been running for the past hour or so. The "Stop" button changed to "Start" after I clicked it, but session continued. The right-click and select "Stop Scraping Session" on the session branch doesn't stop it either -- the log continues to roll, the CPU is still 50% (on a two CPU system).

I'm running v3.0, with the sessio invoked from the workbench (not from a command line driver)

I'm reluctant to just kill SS, but there doesn't seem to be any other way to stop this runaway session.

Crashing/Hanging Workbench

Stumbled across a workaround to a problem I have been having since I started using SS and thought it worthy of sharing, just in case it helps anyone else. Could be just my machine or environment, but also could be a common issue.

The workbench has ALWAYS hung (98% of the time) or crashed (2% of the time) for me, when running a scrape. The crash would occur within the first hour of scraping. This problem occurred every time, so my original workaround was to write my progress to a file, so that after it crashes I can resume from its last position.

Global Array or Accessing a variable from 2 Scrapeables

Ok... So I'm still trying to scrape that forum. The reason I'd like to do it is that the Yahoo group I am subscribed to isn't set up like phpBB. Namely, all posts are in a big long list, instead of being broken into topics.

So at the bottom of the individual posts it has links to the responses in the thread. (Including a link back to itself.) I set up one scrapeable file POST to grab the main post and the numbers that represent the threads. I then set up another scrapeable file THREADS to follow the threads. (This is is invoked after the EXTRACTOR PATTERN)

Solution to my Permission Denied problem

I am a new user of screen-scraper. After the initial learning curve, everything was working except I was getting a permission denied error in my Write To File script when looping through multiple scrapes.

I checked everything to do with the file system with no luck. Then when I did a large run I noticed that the error did not occur all of the time. Time was the answer. Here is my final script with a delay coded to eliminate the error.

Uploading Data Files - CONNECT www.xxxx.com:443 HTTP/1.

I am having problems with the proxy session receiving an error status each time I attempt to upload data files to a website. The website offers a webpage interface where you select the location of the data file on your local hard drive and select a couple of optional attibutes and then click submit to upload the file. The proxy session shows a connect string under the request line but has status of error.

CONNECT home.netscape.com:443 HTTP/1.0

Server only accessable when user logged in???

I have Screen Scraper running as a service, being used by a PHP application and webserver - all on the same box.

When I access via browser from another machine, it works fine as long as the administrator is logged into the server. If he logs out, the application just hangs and times out (on the $session = new RemoteScrapingSession; in the php script).

Also, when the administrator logs in, he has to to do a stop and start on the screen scraper server before it will work (application times out on $session = new RemoteScrapingSession; in the php script if he doesn't).

Scraping Sequential Pages that present content differently

Hey all,
I am trying to scrape a large forum. However, it appears that the older entries have the username presented in a different way. When I encounter the change while scraping, screen scraper will no longer write to my text file. :cry: It gives me a permission denied error in the log.

I think this is because it has no data to write for my AUTHOR token. I tried writing a vbscript that re-scrapes the page with a different scraping session IF AUTHOR = NULL but so far I've had little success.