Nested DataSet
The page I'm trying to scrape is here
As you can see there are two major categories, Repayment Mortgages and Interest Only Mortgages. I'm only interested in scraping repayment mortgages.
Within the repayment mortgages section there are groups of mortgage categories. Each category can contain 1-2 individual mortgages.
Because of this layout, I need to grab the repayment mortgages section into a variable. I need to extract the mortgage categories from it using one pattern. Then I need to go over each category and extract the details of the individual mortgages in the category.
That means I need nested loops. Here is my code which I've commented as well as I can:
// get a holder dataset from the session (mortgage records from other pages on the site
// will go in here also so I pass it from script to script in the session)
frms = session.getVariable("frms");
// this will divide up all the repay morts into categories
DataSet repayGroups = scrapeableFile.extractData(dataRecord.get("repayMorts"), "Loan Group");
// split them into an array of records
allRecords = repayGroups.getAllDataRecords();
// iterate over the records
for(i=0; i < allRecords.size(); i++){
// pull out one record
aDataRecord = allRecords.get( i );
// apply a pattern to extract the individual mortgages in this category
DataSet loanGroupSet = scrapeableFile.extractData(aDataRecord.get("loansGroup"), "Loan Details");
// split the category dataSet into an array of records
loanRecordsInSet = loanGroupSet.getAllDataRecords();
// iterate over them
for(i=0; i < loanRecordsInSet.size(); i++){
// get an individual mortgage
thisLoanRecord = loanRecordsInSet.get( i );
// this is a container which will hold a combination of the individual mortgage's
// details and the details of the category it's in
container = new DataRecord();
container.put("startByDate", aDataRecord.get("startByDate"));
container.put("effectiveFromDate", aDataRecord.get("effectiveFromDate"));
container.put("minMaxDetails", aDataRecord.get("minMaxDetails"));
container.put("initialRate", thisLoanRecord.get("initialRate"));
container.put("fixedToDate", thisLoanRecord.get("fixedToDate"));
container.put("fixedLength", 2);
container.put("svr", thisLoanRecord.get("svr"));
container.put("apr", thisLoanRecord.get("apr"));
container.put("fee", thisLoanRecord.get("fee"));
// add container datarecord to the sitewide dataset
frms.addDataRecord(container);
}
}
// put the holder back in the session
session.setVariable("frms", frms);
This script runs once after the large secion of repayment mortgages have been singled out.
Now the problem is that this somehow gets stuck in an infinite loop while iterating over a particular mortgage category.
When I stop execution manually I get the following error:
NullPointerException (line 10): for ( i = 0 ; -- Null Pointer in Method Invocation
Line 10 would be the declaration of the first for loop by the way.
Now as far as I can figure out. This means that the script tried to perform a method on null. But this doesn't make sense since the line where the error occurs has already executed fine at least once. The only method I call on that line is .size() and there shouldn't be a problem there.
Anyone any idea what's going wrong?
Two problems. 1 you're
Two problems. 1 you're adding a lay of complexity you don't need, and 2 you have the same iterator (i) for both loops. I think this should fix it:
// get a holder dataset from the session (mortgage records from other pages on the site
// will go in here also so I pass it from script to script in the session)
frms = session.getVariable("frms");
// this will divide up all the repay morts into categories
DataSet repayGroups = scrapeableFile.extractData(dataRecord.get("repayMorts"), "Loan Group");
// iterate over the records
for(i=0; i<repayGroups.getNumDataRecords(); i++)
{
// pull out one record
aDataRecord = repayGroups.getDataRecord(i);
// apply a pattern to extract the individual mortgages in this category
DataSet loanGroupSet = scrapeableFile.extractData(aDataRecord.get("loansGroup"), "Loan Details");
// iterate over them
for(j=0; j<loanGroupSet.getNumDataRecords(); j++)
{
session.log("We're on repay group " + i + " of " + repayGroups.getNumDataRecords());
session.log("And loan group " + j + " of " + loanGroupSet.getNumDataRecords());
// get an individual mortgage
thisLoanRecord = loanGroupSet.getDataRecord(j);
// this is a container which will hold a combination of the individual mortgage's
// details and the details of the category it's in
container = new DataRecord();
container.put("startByDate", aDataRecord.get("startByDate"));
container.put("effectiveFromDate", aDataRecord.get("effectiveFromDate"));
container.put("minMaxDetails", aDataRecord.get("minMaxDetails"));
container.put("initialRate", thisLoanRecord.get("initialRate"));
container.put("fixedToDate", thisLoanRecord.get("fixedToDate"));
container.put("fixedLength", 2);
container.put("svr", thisLoanRecord.get("svr"));
container.put("apr", thisLoanRecord.get("apr"));
container.put("fee", thisLoanRecord.get("fee"));
// add container datarecord to the sitewide dataset
frms.addDataRecord(container);
}
}
// put the holder back in the session
session.setVariable("frms", frms);
Wow can't believe I missed
Wow can't believe I missed that iterator naming problem!
Thanks for your help.