3: Generating the XML Feed
Test Run
Let's make sure the scraping session works before we add a few more bells and whistles.
Close the workbench and start up screen-scraper in server mode. Once that's up, assuming you haven't altered the default Web/SOAP Server port (which is also the web server port), and that you're running screen-scraper on your local machine, try entering this URL in to your browser:
http://localhost:8779/ss/xmlfeed?scraping_session=Shopping+Site&SEARCH=bug
If you are running screen-scraper on a machine that is not your local machine or on a different SOAP port then make the required changes to the URL.
If all goes well the browser should take a little bit to load, then you should see an XML document appear containing the extracted information. If you got an error message or the document didn't appear as you expected it to, check screen-scraper's log.
Just as with scraping sessions run remotely, screen-scraper will create a log file in its log folder corresponding to each RSS/Atom scraping session.
RSS/Atom Feed Creation Form
Dealing with the URL directly can be a bit cryptic, what with the encoding and all. As such, let's make use of a little HTML file that will allow us to generate feeds using different search parameters and formats. You can access it at http://www.screen-scraper.com/support/tutorials/tutorial6/xml_feed_generator.htm.
This HTML file assumes that you're running screen-scraper as a server on your local machine on port 8779. If any of that isn't the case you'll want to download the HTML file to your local machine, alter it with your settings, then open it back up in your browser.
Explain, Experiment, Explore
Try experimenting with the form a bit. It gives you control over most all of the features that are available, including the format of the feed. Also take a close look at the URL. screen-scraper simply converts the GET parameters in the URL to session variables in the scraping session. If you'd like, you can even open the feed in your favorite RSS/Atom reader to ensure that the format is valid.
- Printer-friendly version
- Login or register to post comments