Scraping an article site
Hey... I really like this tool and I think it's gonna take off. I need help in extracting hundreds of articles from a free reprint site called articlecity.com.
I also need it to be automated. So, a bot that follows article links and extracts the author, title, and the body of the article and adds it to a mySQL articles table.
Is this even possible in your software... and if so, how can I do it?
Scraping an article site
It seems that it will be a good software that can be used and do your job more faster.
I'm also interested.
Scraping an article site
Hi,
Thanks for the posting. I've taken a look over the articlecity.com site, and it looks like it should be fairly straightforward to scrape. The articles are all nicely categorized and in a consistent format, which should make the crawling and data extraction fairly simple.
If this is a project you're wanting to tackle yourself, my recommendation would be to go through at least our first three tutorials: [url]http://www.screen-scraper.com/support/tutorials/tutorials.php[/url]. Given that you want to insert the extracted data into a database, you might also find our fifth tutorial helpful: [url]http://www.screen-scraper.com/support/tutorials/tutorial5/tutorial_overview.php[/url], as well as this FAQ: [url]http://www.screen-scraper.com/support/faq/faq.php#Database[/url].
I'm guessing you'll want to limit the spidering to a specific category or two, so the basic approach will be to request the page for one of the categories, extract out each of the articles on that page, go to the next page of search results, extract out those articles, etc. It likely won't be too different from the approach we illustrate in our third tutorial.
If this is a project you'd consider outsourcing, feel free to drop us an email ([url]http://www.screen-scraper.com/services/services.php[/url]) or give us a call (800-672-0113), and we'd be happy to discuss that with you. We could also provide you a free quote for the work.
Kind regards,
Todd Wilson