screen-scraper public support
Extracting Data for Pattern
Thanks for this great tool. I am very interested in purchasing it. However, in the three day effort to peel data from one web site, I have yet to be successful. It appears that everything is functioning as it should, except the main Pattern on the details page states that the pattern did not find any matches. I simply don't understand how that would not be working. In fact, I manually went to one of the pages, that had that error, and downloaded via the proxy server.
Trouble understanding screen scraper!
I am trying to text out screen scraping program. I am trying to scrape a site to see exactly how this works. I looked through the tutorial but it didn't really give me insight on how to write the extractor pattern for a site like this:
http://www.tours.com/tours_vacations/alaska.htm
Where I am trying to pull all the information for each tour such as the name, info, location, destination, and website. And then put it into a spreadsheet.
Page incrementing by 20
Hello,
I'm very happy to find this tool and I would like to thank its creators.
I'm following the tutorial to scrape a site.
I've completed almost all.
My only problem is that I don't need a search criteria and the page is incrementing by 20 (It's not the page but it counts the product. if my search returns 100 products, it shows the first 20 then 20 to 40. So in the URL if I put page=20 it means that it will show the product between 20 and 40)
So I've found 2 approaches, 1 on the forum and the other in the tutorial.
Newbie Help with Parent/Child Relationships
Hi,
I've been trying to work a solution, but haven't been able to solve a parent-child issue. Hope you all can help!
Here is the structure:
Child1->ProductList->Product1_Detail
Product2_Detail
Child2->ProductList->Product1_Detail
Product2_Detail
Parent1->Child1->ProductList->Product1_Detail
Product2_Detail
Child2->ProductList->Product1_Detail
Product2_Detail
At any point there can be a child with a product, as well as a parent with children.
Can't see where login info is being sent
Using the basic edition, I'm trying to scrape mint.com (a personal finance site) and I can't figure out where the login information is transmitted.
I've been through the tutorials, and have successfully scraped other sites needing logins, finding the login information in the POST data, and creating the scrapeable page such that the Parameters tab has the user/password automatically filled out.
Start up errors with CentOS 5,2 Linux
Hi, I've been trying to start up screen scraper on CentOS 5.2 Linux 64 Bit with no joy.
The fix listed in the forums must be for an old version of screen scraper as the file names and file contents are different. I have tried to use the tar file and the server seems to start ok, however, the screen-scraper GUI bombs with loads of shared library errors. I have tried exporting different LD_LIBRARY_PATHs but all fail:
Sub Extractor patterns for identical HTML list-items
Hi, please can anybody advise how I should make sub-extractor patterns from the following code?
I need a pattern for the content of every list item but how do I distinguish between them?
Many Thanks!
Newbie question about session variable
Hi,
First, thanks for what looks like a pretty amazing tool. I'm trying to get through
the "Hello World!" tutorial and have most of it working, but my script is failing the run test. Here's the script, which I think I copied correctly:
Logon sequence on asp.net site
Hi,
been using screen-scraper for a while now - excellent program. Done all the basics like scraping a few PHP sites no problem.
I'm now wanting to scrape this site www dot theknowledgeonline dot com
this is in asp.net and requires a logon to access the online db listings. I've tried my best and got it as far as possible but i'm getting 'viewstate MAC cannot be validated' on a server 500 error page when trying to log in.
Getting a logon sequence with ssl, portal redirects, etc. to be reliably repeatable
The captured authentication sequence from the proxy server for a project I am working on looks roughly as follows: