Regarding extracting data from screen scraper

Sir
I have created a script using screen scraper.I want the author name, Book title,details ,price from the website www.flipkart.com .

I have created a script using your script file given in tutorial 2(http://community.screen-scraper.com/Tutorial_2_Overview) which is used for scraping the data from ecommerce site.
I have created all the script but there is the problem in the details page scrap in which there is nothing in the last responce so that i can not make the extractor pattern for it.
I have downloaded all the screen scraper file provided in the ecommerce tutorial and in this file also i did not find the last responce so that i can not undersatand how you make the extratctor pattern for the details page script. Please check my script and make the modification as required.

I have upload my scraper file in the following server kindly download it and please provide me the right solution so that i will do my required task.

www.abebooks.in/myscript/myscript.zip

pankaj.singh on 03/07/2011 at 7:58 am

screen-scraper public support

Have you created a proxy

Have you created a proxy session for the site you are trying to scrape. Then you can just use extractor patterns without any scripts for the natural progression of the website. Then if that is working you can create scripts and then you will know the problem is with the script.
Have you got the output from the scraping session when it runs?

seamus1982 on 03/07/2011 at 1:05 pm

Pankaj, The reason there is

Pankaj,

The reason there is no content under the Last Response tab is because that scrapeable file is never called. The reason it is never called is because the first Extractor Pattern in your "SearchResults" scrapeable file never matches. The reason your Extractor Pattern never matches is because the HTML you are using does not come from the Last Response of the page.

Here's what I would do...

Run your scraping session.
Go to the Last Response tab of the "SearchResult" scrapeable file.
Click the "Find..." button and search for "search-book-books" (without the quotes).
Click "Find" again in order to find the second match to your search.
Select the HTML line where your match was found. Use your mouse and click and drag left to right starting with "<h2>" all the way to the end of the href tag.

Should look something like this:

<h2><a href="/losing-my-virginity-other-dumb-book-0143415123/search-book-books/11?ref=4fb8f427-1269-4332-9d4b-18c2d8e728d3">

Note: We are not extracting the book title right now because we can extract it on the details page.
Right-click on your selection and choose "Generate extractor pattern from selected text".
You will automatically be sent to the Extractor Patterns tab.
Edit your Extractor Pattern text by highlighting the different pieces that should be passed to the "detail page" scrapeable file as parameters. Be sure to click "Save as session variable" for each of your tokens.

Should look something like this:

<h2><a href="/~@BOOK_URL@~/search-book-books/~@PAGING@~?ref=~@ref@~">

Now, if you go through our tutorials again you should learn that you typically create your scrapeable files from transactions in your proxy session (rather than creating them manually). You should also learn the common way to reference your session variables is under the Parameters tab of your scrapeable file (rather than in the URL field).

I strongly recommend that you go through our tutorials a few more times. There are some basic concepts that you need to review a bit further. I think you may be experiencing the reason it is important to take the time to go through our tutorials step by step and not to rush through them.

In the end you will accomplish your goal much sooner if you'll just take the time to walk slowly through our tutorials.

-Scott

swilsonmc on 03/07/2011 at 2:27 pm

Search

Community

screen-scraper

User login

Regarding extracting data from screen scraper

Have you created a proxy

Pankaj, The reason there is