Struggling with next page with no next link and no current page data
Hi
Here is the tidied code for the pagination of a site that I am trying to scrape. It shows 5 pages at the end (but I assume this could shrink or grow). There is nothing in the pagination code to tell you which page you are on or which is the next one (very helpful for users). I have looked through and attempted to use the examples that are available to no avail.
I have a scrapable file witha main extractor pattern that saves the PRODUCTIDs on the page as a session variable then calls the product details scrapable file after each pattern match.
The same scrapable file contains an extractor for the current page which I can get from a separate area at the top ofthe code as follows:
form name="aspnetForm" method="post" action="stock-list.aspx?code=~@junk@~&page=~@PAGE@~" id="aspnetForm">
Here is the pagination code:
<ul class="paging">
<li><a href="/stock-list.aspx?code=1002&page=1">1</a></li>
<li><a href="/stock-list.aspx?code=1002&page=2">2</a></li>
<li><a href="/stock-list.aspx?code=1002&page=3">3</a></li>
<li><a href="/stock-list.aspx?code=1002&page=4">4</a></li>
<li><a href="/stock-list.aspx?code=1002&page=5">5</a></li>
</ul>
</div>
</div>
<br clear="both" />
I feel like I have given up by posting this - because I have. I've spent hours trying to sort this on my own but my limited skills have let me down.
Grateful for your help in advance
Jason (the one who knows nothing about Screen Scraper)
Here's what I would do. When
Here's what I would do. When I start each set, I'd set a session variable PAGE to 1, then I would have an extractor like this:
After each pattern match, I would have a script like:
if (page > session.getv("PAGE"))
{
session.setv("PAGE", page);
session.scrapeFile("Search results");
}
else
session.log("Already have page " + page);
Thanks for your help I have
Thanks for your help
I have set the script called HAS_NEXT_PAGE to run after the PAGE extractor pattern runs after each pattern match but I get this error:
SearchResultsEVS: PRODUCTID: Processing scripts after a pattern application.
Processing script: "ScrapeDetailsPageEVS"
Scraping file: "DetailsPageEVS"
DetailsPageEVS: Resolved URL: http://www.evsonlineauctions.com/stock-detail.aspx?rnum=3276MZZM&code=1002
Setting referer to: http://www.evsonlineauctions.com/stock-detail.aspx?rnum=5111Z5WZ&code=1002
DetailsPageEVS: Sending request.
SearchResultsEVS: PRODUCTID: Processing scripts once if pattern matches.
SearchResultsEVS: PRODUCTID: Processing scripts after all pattern applications.
SearchResultsEVS: Processing scripts before all pattern applications.
SearchResultsEVS: Extracting data for pattern "PAGE"
SearchResultsEVS: The following data elements were found:
PAGE--DataRecord 0:
junk=1002
PAGE=1
Storing this value in a session variable.
SearchResultsEVS: PAGE: Processing scripts after a pattern application.
Processing script: "HAS_NEXT_PAGE"
ERROR--EVS: An error occurred while processing the script: HAS_NEXT_PAGE
EVS: The error message was: class bsh.EvalError (line 2): ) { -- Operator: '">"' inappropriate for objects
Processing scripts always to be run at the end.
In the code you kindly provided, I have changed the 'search results' scrapable file name to SearchResultsEVS to match the page I need scraping in the script:
if (page > session.getv("PAGE"))
{
session.setv("PAGE", page);
session.scrapeFile("SearchResultsEVS");
}
else
session.log("Already have page " + page);
I would be grateful for you helping me with where I ahve gone wrong.
Jason (the one who still knows nothing about very much)
The error "Operator: '">"'
The error "Operator: '">"' inappropriate for objects" tells me that you session variable "PAGE" is a String.
Sorry this has occurred with a different site
Apologies for dragging up an old thread, but I didn't have to complete the other one in the end.
I have the same problem and I have got to the same point in the scrape and get the same message.
I set a session variable in the initialisation file:
session.setVariable( "PAGE", 1 );
I then add a script to run after each pattern match
Here is my pattern:
<li><a class='pagination' href="?page=~@junk@~1">~@PAGE@~</a></li>
Here is the script that runs:
page = Integer.parseInt(dataRecord.get("PAGE"));
if (page > Integer.parseInt(session.getv("PAGE"))
{
session.setv("PAGE", page);
session.scrapeFile("HirecoSearchresults");
}
else
session.log("Already have page " + page);
Additional Info
There can be any number of pages, but there is no indication of a 'next' or changes to the code to indicate which page you are on/can go to
(I do have the total number of results and I know that there are 12 to a page.)
As usual, I am grateful in advance for the benefit of your wisdom. Each time I genuinely learn something new. In about 20 years or so I may become good at this!
Jason
Is there any way you can see
Is there any way you can see the total number of pages, or the total number of results?
I can see the total number of results, yes
They are at the top of the page - can put it in a token easily
regards
Jason
Could you use a script like
Could you use a script like this:
{
total = Integer,parseInt(session.getv("TOTAL_REVIEWS"));
perPage = 10;
pages = total/perPage;
if (total%perPage>0)
pages++;
log.logInfo(">>>Found " + pages + " pages");
for (i=2; i<=pages && session.getv("CONTINUE") && !session.shouldStopScraping(); i++)
{
log.logInfo(">>>Scraping page " + i + " of " + pages);
session.setv("PAGE", i);
// session.breakpoint();
session.scrapeFile("Reviews sorted");
}
}
Almost there!
Hi
I am running the script HirecoNextPage as pasted from above (I think there was a comma instead of a full stop after the integer on line 3) after the TOTAL_REVIEWS pattern is applied and I am getting the following error:
HirecoDetailsPage: DATARECORD: Processing scripts once if pattern matches.
HirecoDetailsPage: DATARECORD: Processing scripts after all pattern applications.
HirecoSearchResults: PRODUCTID: Processing scripts once if pattern matches.
HirecoSearchResults: PRODUCTID: Processing scripts after all pattern applications.
HirecoSearchResults: Processing scripts before all pattern applications.
HirecoSearchResults: Extracting data for pattern "NEXT"
HirecoSearchResults: The following data elements were found:
NEXT--DataRecord 0:
TOTAL_REVIEWS=15
Storing this value in a session variable.
HirecoSearchResults: NEXT: Processing scripts after a pattern application.
The token "TOTAL_REVIEWS" has no regular expression.
HirecoSearchResults: NEXT: Processing scripts once if pattern matches.
HirecoSearchResults: NEXT: Processing scripts after all pattern applications.
Processing script: "HirecoNextPage"
>>>Found 2 pages
ERROR--Hireco: An error occurred while processing the script: HirecoNextPage
Hireco: The error message was: class bsh.EvalError (line 10): && ! session .shouldStopScraping ( ) ; -- illegal use of null value or 'null' literal
I am on the professional version.
Many thanks for your help here.
Jason
You're right about the
You're right about the comma/dot.
In the sample I alluded to a session variable "CONTINUE" that I bet you don't have or need.