Ajax 'Load More' button - Stumped

I am trying to scrape this site: https://www.littlermachinery.co.uk/stock/ and I understand that I have to use SetRequestEntity but I have no idea how this works (in laymans terms.)

I assume that I take the JSON at the bottom of the first page, use it to set the entity (although not sure how) and what happens after it is set?

Would I also need to work out the pagination from the {"page":1,"per_page":10,"total_rows":20,"total_pages":2} part?
Many thanks for your help here


Jason, this one is easy if

Jason, this one is easy if you know a bunch of little tricks. It seems easier to make a prototype of the scrape and attach it for you.

Things to note:

  • In the init script, I made a loop to request pages, in case there were more than two. Since there isn't I don't know if more parameters would need to be changed on the "Results next" parameters.
  • The data on "Results" was difficult to see where a listing ended (they were just </div> tags). I therefore have a token on there named END and if you look at the RegEx on it, I used a lookahead.
  • The response for "Results next" is JSON, and you can see the script that sets the headers.
  • The JSON of "Results next" contains a value that is escaped HTML to display on the page. I therefore parsed out the value, and used a method session.scrapeString() to send that HTML back to the "Results" scrapeable file and scrape it.

Many thanks for what looks like a tonne of work!

I am afraid 80% of it is over my head, but it did not seem to work out of the box.

The first page is scraped but the [email protected]@~ token did not find any matches, (see below). I tried to mess with it, but could not get it to run at all?

Have I got an incorrect setting in the SS software somewhere?


>>>> Requesting page 2 of 2 <<<<
Scraping file: "Results next"
Results next: Processing scripts before a file is scraped.
Processing script: "LittlerMachinery example - set JSON headers"
Results next: Requesting URL: https://www.littlermachinery.co.uk/wp-json/facetwp/v1/refresh
Results next: POST data: { "action": "facetwp_refresh", "data": { "facets": { "vehicles_rad": [], "year_3": [], "text_search": [], "load_more": [ 2 ] }, "frozen_facets": {}, "http_params": { "get": [], "uri": "stock", "url_vars": [] }, "template": "vehicle_layout", "extras": { "sort": "default" }, "soft_refresh": 1, "is_bfcache": 1, "first_load": 0, "paged": 2 } }
Results next: Processing scripts before all pattern applications.
Results next: Extracting data for pattern "JSON"
Results next: The pattern did not find any matches.
Results next: JSON: Processing scripts once if no matches.
Results next: JSON: Processing scripts after all pattern applications.
Results next: Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.
Results next: Processing scripts after a file is scraped.
>>>Completed on page 2<<<
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "LittlerMachinery example" finished.

Apologies - I was right - I had Jtidy on.

I removed JTidy and it worked perfectly thank you!