screen-scraper public support

Questions and answers regarding the use of screen-scraper. Anyone can post. Monitored occasionally by screen-scraper staff.

[Scrape] Image processor blocking directory listing

Hey guys,

Im trying to scrape URL like these

http://example.com/prod?set=key[source],value[/environment/2011/B13_0086_125R_0.jpg]&set=key[rotate],value[0.45]&set=key[width],value[2966]&set=key[height],value[3468]&set=key[x],value[1433]&set=key[y],value[247]&call=url[file:/product/large]

All images spit out as default names(e.g default.jpg)

Anyway other then bruteforcing the directory listing to get something like this

http://www.example.com/product/large/environment/2011/B13_0086_125R_0.jpg

Has anyone dealt with this kind of image handling?

Code problem

I am currently testing SS basic addition and I have trouble extracting a page which contains german characters like "äöü". Normally these are represented via \&auml\; \&ouml\; etc., but this page contains them in "raw" format: http://immoads.oe24.at/. So far I had no luck trying different codes and tidy/no tidy settings: The german characters are either replaced by question marks or appear as square symbols.

best regards

Difference Between Resolved URL and Setting referer & remove parameters

Hi,

I am scraping a site and I notice that there is a difference between the resolved url and then the setting referrrer to url. I was wondering what this means with regard data being inputted into the site?

One other thing. I am using the remove parameter and this seems to be changing the parameters. Do the parameters not go back to the default after the scrape of a page?

Thanks in advance,
Seamus McMahon

How to get to this data

I stumbled upon this site where the search results are somehow loaded outside the main page. If I look at the source code or save the page as html the search result section isn't there.
I was wondering if the professional version of the software can handle this type of pages and what's the solution to it.

Here is an example url:

 

Start a scraping session from a text file / csv

Hi, Im trying to find a simple explanation or example for using URL's in a text file for the target URL for scraping.

I can see from various discussions that it is possible, but Im unable to find any clear instruction on the forum as to exactly what to enter as the URL under the properties tab in the scrapeable file forms.

For example I have a file called linkstoscrape.txt file with the following links to be scraped
http://www.somesite.com/1.htm
http://www.somesite.com/2.htm
etc.

What is the correct way to get the program to open this file and begin to scrape each link

Export cookie from screenscraper

Hi,

I was wondering if there is anyway I could return the data from the previous from a scrape and pass the cookie to the user so the user can log into a session with his/her data entered.

Regards,
Seamus

Problem With Parameters Tab

Hi,

I have a problem with a site I am trying to scrape. The parameters tab is stalling on me. I can't see the parameters for the tab. This is not a problem for any of the other sites I try to scrape but it is causing a problem for this one. I don't know if there is a way to stop this from happening.

By the way I will be purchasing a copy of the enterprise version in the next couple of weeks. I still just testing at the minute.

Regards,
Seamus

Scraping all URL's on a site

Hey All,

I have created a SS file that will find and log all files found on a site. THis I will use later on with jMeter to do load testing.

My issue is that this process takes a very long time (30 minutes for 750URLS) (server does 10 req/sec, give or take).

This is the process:

Init -> Scapre homepage -> call write-URL's script -> Load next page ->scrape URL -> call write-URL's script etc etc

The problem is that for each found URL it calls up my "Write URL's" scrip, and as such it's get called very often for all pages found in the menu's.

Evil AJAX

Hi there.

I've been scraping away merrily and everything has been working fine, until I wanted to scrape the brand names from this site. The problem is the site offers users to view the products either from TYPE (the default) or from Brand (A - Z)

http://www.mysupermarket.co.uk/shelves/Condiments_in_Tesco.html

Problem is when you select Brand (A - Z) from the dropdown, no URL changes. I've worked out that it's done using an AJAX call like this:

http://www.mysupermarket.co.uk/Ajax/SaveGroupBy.aspx?GroupBy=2366

This then sets a cookie or something in the session variable.

regarding Stack full issue

Hi
I have made a script for fetch the data from site.The problem is that i need 10 lacs data from that site but your screen scraper stops when script reaches 500 records and msg show that stack full as i think its limit is 50 So i will change its limit by using session.setMaxScriptsOnStack(10000);

It works but my script halted as the 1000 records fetched without any warning or error it stops.
I have made script using your script provided in tutorial 2 for shop site using paging.