screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

Sorry, tidying HTML failed. Returning the original HTML

Hello,

i want to scrape from myspace.com. but when I try to access any page from myspace i get the referal:

Redirecting to: http://www.myspace.com/help/browserunsupported

and the screen scraper log says:

Sorry, tidying HTML failed. Returning the original HTML

Is there anything you can do here, please?

Thanks in advance

login to flickr

Hello

i want to login to flickr in order to scrape some pictures, but it always fails to do so.

http://www.flickr.com

Anybody did this before?

So i had the idea it would be good to be able to login manually and then continue.

So when its time to login i want screen scraper to open a browser - i login manually - then screen scraper should continue with me logged in.

Is that possible?

Regards

Ben

execute JavaScript

I have a page that I need to login to and have the login page figured out but there is a page sitting in-between the login page and the secured home page. When I login with the browser this in-between page is submitted automatically with JavaScript.

<script language="javascript">window.setTimeout('document.forms[0].submit()', 0);</script>

But If I login with Screen-Scraper it never gets past this page. I tried extracting the data from the only 2 hidden form fields and submit it myself with screen-scraper but that does not work either.

MaxConcurrentScrapingSessions and Server.NumTimesRun properties

Hi,

In the screenscraper properties file there are two properties I am not sure what they mean

MaxConcurrentScrapingSessions - does this control how many simultaneous requests screen scraper will handle. So if I have this set to 5, and 20 requests come to screen scraper at the same time. Will 5 of those requests get immediately handled and the other 15 get queued up?

Server.NumTimesRun - no idea what this is but the example properties file on the at the screen scraper site has a value of 2186, and our value is 8. Curious what is controls.

thanks,
Erik

Beginner question

Hi. I only want to scrape example 2 (the one with "sold.gif"). But I do not manage it to do that, since the information I want to scrape is above the sold indicator (sold.gif) code.

EXAMPLE 1:

div class="heading">
a href="http://www.test.com/object?code=~@URL@~" >~@IGNORE@~

div class="primary_info">
div class="year_heading">~@IGNORE@~ div class="milage_heading">~@IGNORE@~ div class="price_heading">~@IGNORE@~ div class="year">~@IGNORE@~ div class="milage">~@IGNORE@~

Memory goes from 64% to 100% and wont go back down.

Should the memory reset once the scrape is finished? My memory starts at 64% and goes up to 80% by the time the scrape ends and then after the scraper is done it creeps up from 80% to 100%. The memory then sticks at 100% and the only way to reset it is to close the workbench and restart it. It I am using Mac os v10.5.8 and I have tried SS basic, SS Professional 30 day demo, and SS Enterprise 30 day demo. They all do the same thing.

viewstate - slows down the software

Recently I noticed that long viewstate values are throwing an error in my scrapable files and are slowing down the application while trying to view the last response.

http://www.legalnotice.org/pl/azcentral/landing1.aspx

Saving the database

Whenever I save in the program and exit. When I re-enter only the scripts show up on the left column, and folders created. No scraping sessions or proxies make it, they are all deleted. I know I can export scraping sessions manually but surely this isn't how the program is supposed to work? Its pretty annoying remembering to export it before closing, and having the re-import every time using it and having to create a proxy every time since it can't be exported. What gives?

For loop writing same result for every page

Hi,

I'm trying to scrape some data from a site, every page is the same structure. The page url's are also numbered in sequential order, so for example.

www.football.com/scores001.html
www.football.com/scores002.html
etc

I followed tutorial 1 and setup the extractor patterns etc so that I could scrape what I needed from page 1. I then tried to add a loop to scrape the data for pages 001-999, writing them to a txt file. As part of this I set the URL on the PROPERTIES tab to the session variable ~#URL#~, this variable is changed within the loop.

an extractor token with zero or more characters?

Hi

I thought the extractor tokens would be able to use a "zero or more characters" mask, but I can't be able to get it to work. I found that most of my errors are because if this.

Here is a simple example

I want a single pattern to match both texts:

text1
-------------

</td>
<td valign="top">
<div style="float:left;"><a class="lbb" href="


text2
-------------

</td>
<td class="odd" valign="top">
<div style="float:left;"><a class="lbb" href="


----

they are both identical except for on the second line: class="odd".

So I thought I could put an extractor token in its place, like this

-----

</td>
<td ~@junk@~valign="top">
<div style="float:left;"><a class="lbb" href="

----------

However screen-scraper is only matching up the 2nd text example. So I though it must be because my extractor token needs to have a regex of 'zero or more' times - but I can't get it working.

Any tips would greatly be appreciated