Trouble understanding screen scraper!

I am trying to text out screen scraping program. I am trying to scrape a site to see exactly how this works. I looked through the tutorial but it didn't really give me insight on how to write the extractor pattern for a site like this:

http://www.tours.com/tours_vacations/alaska.htm

Where I am trying to pull all the information for each tour such as the name, info, location, destination, and website. And then put it into a spreadsheet.

My company is pulling all the information that was typed in our pages and trying to pull the information in excell files and put it to a database. Can anyone help me so that I can understand and see if this will help my company.

Thanks that worked. How do you export that data to an excell file to view?

You'll want to make use of

You'll want to make use of the "DATARECORD" variable. This variable is a special variable that will allow you to do "Sub extractor" patterns. The idea is that if you have an HTML table on the website, you could make the following extractor pattern:

~@DATARECORD@~

Then, you can click on the "Sub-Extractor Patterns" tab of the extractor pattern. From there, you can click on the "Add Sub-Extractor Pattern" button.

These "sub extractor" patterns will only perform their searches between "<tr>" and "</tr>" tags, because that's what you've specified by making the very first pattern that I showed to you above.

So what does this all mean? Here's an example from the site you posted a link for.

The main extractor pattern:

~@DATARECORD@~


~@junk_parameters@~: DON'T save in session variable    Pattern: [^>]*
~@DATARECORD@~: DON'T save in session variable    Pattern: No pattern needed

    Sub-extractor patterns:
  • ~@ENTRY_NAME@~

    ~@ENTRY_NAME@~: Save in session variable?    Pattern: [^><]*
  • ~@DESCRIPTION@~


    ~@DESCRIPTION@~: Save in session variable?    Pattern: [^><]*
  • LOCATION:~@LOCATION@~


    ~@LOCATION@~: Save in session variable?    Pattern: [^><]*
  • DESTINATION(S):~@DESTINATIONS@~


    ~@DESTINATIONS@~: Save in session variable?    Pattern: [^><]*

  • ~@WEBSITE@~: Save in session variable?    Pattern: [^"]*

Give it a try and see if it works!

Tim

Trouble understanding Screen Scraper

Thanks That work! So how do you remove the information that I got and put it into an excell spreadsheet.

Writing to a file

I see you made a new thread for this--- we have to approve comments before they show up on the site. Sorry if there's any confusion! Check your other thread for the answer.

http://community.screen-scraper.com/node/1096#comment-1989