[Solved] Scraping a dynamic web page
Dear Community, dear Jason,
I would like to scrape data from a specific URL (https://www.kickstarter.com/projects/597507018/pebble-e-paper-watch-for-iphone-and-android/backers). Thanks to the fantastic tutorials I’m now able to scrape exactly what I need (Surename and amount of funded projects). I need this data to create a statistic on the distribution of male/female donors on this page and the average number of funded projects for each gender.
However, there is one problem. I believe the site is a dynamic web page as it only shows 50 rows of data (i.e. people), but only loads the next 50 when you scroll down and so on, until the end is reached. Logically, my scraper is only able to scrape 50 rows of data. What do I have to tell the scraper that it continues scraping until the end is reached? Or where can I find this information?
I've also tried it by following the e-commerce tutorial by creating a scraper for the individual pages (https://www.kickstarter.com/projects/597507018/pebble-e-paper-watch-for-iphone-and-android/backers?page=5), but even if I load these pages in my browser all it shows there are the first 50 results.
Unfortunately, I could not find a tutorial on this issue and searches in the forum did not lead to what I was looking for. I believe the command that prompts the site to reload is as follows, (if this helps):
</ul>
<div class="load_more">
<div class="loading">
<img alt="Loading small" src="https://d297h9he240fqh.cloudfront.net/assets/icons/loading-small-d7c93c38ad18f83b4eeb73b8ed9edff7.gif" />
<div class="copy">Please wait</div>
<div class="clear"></div>
</div>
</div>
I also read in another thread called "pages with infinite scroll" that one has to find the java script at the end of the page that triggers the page to load further, but could not find it.
Please excuse my ignorance, but I simply could not find anything on the internet that could have helped me to solve this issue.
I’m very grateful for any kind of help. Thank you!
I wondered if it was possible
I wondered if it was possible to save the fully loaded html to my desktop and then scraping it by putting as url "c://users/desktop/" into the screen scraper software, but that didn't seem to work.
I "solved" it by scrolling down until I reached an acceptable amount of loaded rows of data, saved the page to my desktop and then uploaded it to another server. Then I was able to scrape the page on the new server, but only to the point where I had previously scrolled (before saving it to my desktop). This only worked for for 500 rows of data. Anything above that will lead to errors ("timed out").
You shouldn't need to save
You shouldn't need to save anything to your desktop. In that page there is a JavaScript that runs at the bottom of the page, and it makes an HTTP request that looks like https://www.kickstarter.com/projects/597507018/pebble-e-paper-watch-for-iphone-and-android/backers?cursor=327760304. If you proxy that request you can see all the results on the next page, including the next cursor value, and you can just make the request again with that new value to get though all pages.
Easier even than that, at the bottom of the page there is some HTML that is there for older browsers or something, and it's in div class="pagination". Easy way to request the next page as HTML.
Thank you!
Thank you very much Jason!