Replacement get error: java.lang.NullPointerException BSF info: null at line: 0 column: columnNo

Fellow scrapers,

In order to get one common format in my DB I try to replace sites different expressions for one by using:

trueListingtype = session.getVariable("LISTING_TYPE");
trueListingtype = trueListingtype.replaceAll("WrongExpression1", "RightExpression1");
session.setVariable("LISTING_TYPE", trueListingtype);

trueListingtype2 = session.getVariable("LISTING_TYPE");
trueListingtype2 = trueListingtype2.replaceAll("WrongExpression2", "RightExpression2");
session.setVariable("LISTING_TYPE", trueListingtype2);

This works most of the time but occasionally i get "The application script threw an exception: Replacement get error: java.lang.NullPointerException BSF info: null at line: 0 column: columnNo" and my submission is corrupted.

What could be wrong?

Best,
Johan

I'd suggest explicitly

I'd suggest explicitly clearing the session variable each time before you run the scrapeable file. i.e....

session.setVariable("LISTING_TYPE","");

This will solve two problems. First you won't have a 'null' value so if you fail to extract a new value for it you'll just be doing a replaceAll on a blank String (which is different to a null value) and avoid the error.

Second issue is that if your last extracted value for LISTING_TYPE was say 'rent' then you run the scrape a second time and don't find anything it won't overwrite LISTING_TYPE, so it will have the value from the previous page you scraped. This can lead to all sorts of mess in your data.

Otherwise a different approach which will only avoid the nullPointerException is to just check for a null first.. i.e...

trueListingtype2 = session.getVariable("LISTING_TYPE");
if (trueListingtype2 != null) {
trueListingtype2 = trueListingtype2.replaceAll("WrongExpression2", "RightExpression2");
}
session.setVariable("LISTING_TYPE", trueListingtype2);

Thanks for the reply

Thanks for the reply Shadders!

Maybe I did not express myself clearly: In fact the LISTING_TYPE will always have a value, thus the "null" problem should not appear? When Tim wrote "...ensuring that your variables do in fact have the values you expect." I thought that the fact that the session variable did not have the "WRONGEXPRESSION1" value but the "WRONGEXPRESSION2" could lead to that the the first replaceAll cried out the error?

Is it okay to search for multiple expression the same session variable the way I do?

I will clear the sessionvariable as you suggested and give it a go.

Greetings,
Johan

Well, if it can't find a

Well, if it can't find a "wrongexpression" in your string, it simply won't replace anything at all-- no error should be raised.

Thus, nothing would happen on the following line:

String result = "abcd".replaceAll("q", "!!!!!");

You can search for as many expressions as you need on a variable. Of course, just be sure to save the result from one call to the next, so that replacements are saved between 'replaceAll' calls.

Yeah, the NPE

Yeah, the NPE (NullPointerException) is the most horrible thing... no line number to go off of :)

The problem is always that you're trying to use a variable that doesn't have a value. In your case, it's happening when you're trying to replace text. I would check to make sure that your LISTING_TYPE session variable is what you expect it to be. It most likely is "null", and thus you can't replace anything about "null", since it isn't a value.

sometimes the happens when you're not checking the "Save to session variable?" box on your extractor pattern token. I'd start there, and work your way through, ensuring that your variables do in fact have the values you expect.

Tim

Aha

That explains the error but I still have no idea on how to reach the solution. The variable does not always have the value that I want to replace in that string. I can take on two and sometimes three values (such as buy, rent or vacation rentals) and I would like to be able to replace them all to a common name structure. Should I use a different method?

Hi Johan, There's nowhere

Hi Johan,

There's nowhere else in the code you've listed where you could possibly get a null pointer exception except for the replaceAll lines so I'd suggest that you are in fact getting instances of LISTING_TYPE being blank. After you get the error try going back to the scrapeFile screen and click the 'apply pattern to last scraped data' button, this should show you the results and tell you for sure if the pattern is matching something. If it is then check that you are actually saving the value to a session variable. Have you changed the name of the LISTING_TYPE token? even if you just deleted a character then typed it straight back then the setting for that token will be lost. Double click on it again and check if it's set to save session vars.

Failing that have a look at how you are calling this script. Are you sure you're calling it after the extractor pattern is run? You may have accidentally set it run before the pattern is applied. Start at very first scrapeable file and check it's scripts tab then it's extrator patterns tabs and follow the sequence that the scraping will follow step by step, again make sure that you get to the extractor patter before you've called the script. It's pretty easy to call a script in the wrong place. Even more confusing when you've accidentally called the script in two places, you'll be be tearing your hair out because it's in the right spot not realising that the same script is being called a second time ealier on...

If none of those idea's help then insert some log lines in your code. i.e. every second line put session.log("1") then, 2,3,4 etc... then you'll be able to exactly which line is triggering the error. Maybe also put in a line to log the value of all the variables as well.

I've torn a lot of my own hair out with these types of problems and those are pretty all the ways I can think of that you could get this sort of error so one of them should get you out of trouble...

Very useful hints Shadders.

Very useful hints Shadders. Thanks! I'll dig my head in.