Keep getting a Null Pointer Exception error. error message was: NullPointerException (line 60): proxyServerPool .filter
Screen Scraper version 6.0.25a
Error keep recurring: NullPointerException (line 60): proxyServerPool .filter
New bug in the 6.0.25a release?
Script output:
Validated proxy 53 of 55. This proxy server will be removed: 93.126.43.244:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 54 of 55. This proxy server will be removed: 31.3.230.18:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 55 of 55. This proxy server will be removed: 123.30.12.188:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Finished filtering proxy servers. We now have the following number of good proxy servers: 0
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_proxies_O.txt
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_good-proxies_O.txt
There are currently no proxy servers in the pool.
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Filtering the proxy server pool.
Number of proxies to test: 1
Connection timeout is: 9
Validating proxy: 1 of 1
Validated proxy 1 of 1. This proxy server will be removed: null:-1 (null)
ERROR--O: An error occurred while processing the script: COMPILE_GOOD_ANONYMOUS_PROXIES
O: The error message was: NullPointerException (line 60): proxyServerPool .filter ( 9 ) --Sourced file: inline evaluation of: ``import com.screenscraper.util.*; // Create a new ProxyServerPool object. This o . . . '' : Method Invocation proxyServerPool.filter
Validated proxy 54 of 55. This proxy server will be removed: 31.3.230.18:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 55 of 55. This proxy server will be removed: 123.30.12.188:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Finished filtering proxy servers. We now have the following number of good proxy servers: 0
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_proxies_O.txt
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_good-proxies_O.txt
There are currently no proxy servers in the pool.
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Filtering the proxy server pool.
Number of proxies to test: 1
Connection timeout is: 9
Validating proxy: 1 of 1
Validated proxy 1 of 1. This proxy server will be removed: null:-1 (null)
ERROR--O: An error occurred while processing the script: COMPILE_GOOD_ANONYMOUS_PROXIES
O: The error message was: NullPointerException (line 60): proxyServerPool .filter ( 9 ) --Sourced file: inline evaluation of: ``import com.screenscraper.util.*; // Create a new ProxyServerPool object. This o . . . '' : Method Invocation proxyServerPool.filter
Contents of COMPILE_GOOD_ANONYMOUS_PROXIES
import com.screenscraper.util.*;
// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.
proxyServerPool = new ProxyServerPool();
// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).
session.setProxyServerPool( proxyServerPool );
// This tells the pool to populate itself from a file
// containing a list of proxy servers. The format is very
// simple--you should have a proxy server on each line of
// the file, with the host separated from the port by a colon.
// For example:
// one.proxy.com:8888
// two.proxy.com:3128
// 29.283.928.10:8080
// But obviously without the slashes at the beginning.
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_A.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_AA.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_B.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_C.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_D.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_E.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_F.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_G.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_H.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_I.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_J.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_K.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_L.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_M.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_N.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_O.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_P.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Q.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_R.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_S.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_T.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_U.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_V.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_W.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_X.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Y.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Z.txt" );
// screen-scraper can iterate through all of the proxies to
// ensure theyre responsive. This can be a time-consuming
// process unless it's done in a multi-threaded fashion.
// This method call tells screen-scraper to validate up to
// 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );
// This method call tells screen-scraper to filter the list of
// proxy servers using 7 seconds as a timeout value. That is,
// if a server doesnt respond within 7 seconds, it's deemed
// to be invalid.
proxyServerPool.filter( 9 );
// Once filtering is done, it's often helpful to write the good
// set of proxies out to a file. That way you may not have to
// filter again the next time.
proxyServerPool.writeProxyPoolToFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_ALL.txt" );
// You might also want to write out the list of proxy servers
// to screen-scraper's log.
proxyServerPool.outputProxyServersToLog();
// This is the switch that tells the scraping session to make
// use of the proxy servers. Note that this can be turned on
// and off during the course of the scrape. You may want to
// anonymize some pages, but not others.
session.setUseProxyFromPool( false );
// As a scrapiing session runs, screen-scraper will filter out
// proxies that become non-responsive. If the number of proxies
// gets down to a specified level, screen-scraper can repopulate
// itself. Thats what this method call controls.
proxyServerPool.setRepopulateThreshold( 4 );
// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.
proxyServerPool = new ProxyServerPool();
// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).
session.setProxyServerPool( proxyServerPool );
// This tells the pool to populate itself from a file
// containing a list of proxy servers. The format is very
// simple--you should have a proxy server on each line of
// the file, with the host separated from the port by a colon.
// For example:
// one.proxy.com:8888
// two.proxy.com:3128
// 29.283.928.10:8080
// But obviously without the slashes at the beginning.
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_A.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_AA.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_B.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_C.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_D.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_E.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_F.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_G.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_H.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_I.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_J.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_K.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_L.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_M.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_N.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_O.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_P.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Q.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_R.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_S.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_T.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_U.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_V.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_W.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_X.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Y.txt" );
proxyServerPool.populateFromFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_Z.txt" );
// screen-scraper can iterate through all of the proxies to
// ensure theyre responsive. This can be a time-consuming
// process unless it's done in a multi-threaded fashion.
// This method call tells screen-scraper to validate up to
// 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );
// This method call tells screen-scraper to filter the list of
// proxy servers using 7 seconds as a timeout value. That is,
// if a server doesnt respond within 7 seconds, it's deemed
// to be invalid.
proxyServerPool.filter( 9 );
// Once filtering is done, it's often helpful to write the good
// set of proxies out to a file. That way you may not have to
// filter again the next time.
proxyServerPool.writeProxyPoolToFile( "/opt/ssenterprise/proxies/ANONYMOUS_good-proxies_ALL.txt" );
// You might also want to write out the list of proxy servers
// to screen-scraper's log.
proxyServerPool.outputProxyServersToLog();
// This is the switch that tells the scraping session to make
// use of the proxy servers. Note that this can be turned on
// and off during the course of the scrape. You may want to
// anonymize some pages, but not others.
session.setUseProxyFromPool( false );
// As a scrapiing session runs, screen-scraper will filter out
// proxies that become non-responsive. If the number of proxies
// gets down to a specified level, screen-scraper can repopulate
// itself. Thats what this method call controls.
proxyServerPool.setRepopulateThreshold( 4 );
I was able to reproduce this
I was able to reproduce this problem if I put a line in my proxy file that was just a whitespace character. I made a change to screen-scraper that should get pushed out in the next alpha to fix this.
Can you post the stack trace for this error so I can verify that I found the same bug you are experiencing? You should be able to get a stack trace for it by wrapping the entire script in a try/catch block and then logging the exception like so:
{
// Place your script code here
}
catch(Exception e)
{
log.logException(e);
session.stopScraping();
}
It will then output the entire stack trace to the log.
Added Try Catch - now screen-scraper just hangs
After adding the Try Catch blocks I reran the session script.
System ran to the point of calling COMPILE_GOOD_ANONYMOUS_PROXIES then just stopped.
Validating proxy: 55 of 55
Validated proxy 31 of 55. This proxy server will be removed: 58.65.136.166:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 32 of 55. This proxy server will be removed: 200.62.147.146:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 33 of 55. This proxy server will be removed: 89.122.211.5:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 34 of 55. This proxy server will be removed: 93.116.194.136:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 35 of 55. This proxy server will be removed: 89.42.86.186:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 36 of 55. This proxy server will be removed: 86.127.182.53:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 37 of 55. This proxy server will be removed: 85.234.38.157:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 38 of 55. This proxy server will be removed: 213.79.100.50:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 39 of 55. This proxy server will be removed: 88.147.149.226:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 40 of 55. This proxy server will be removed: 195.93.189.194:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 41 of 55. This proxy server will be removed: 94.190.18.216:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 42 of 55. This proxy server will be removed: 80.255.23.50:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 43 of 55. This proxy server will be removed: 91.204.138.58:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 44 of 55. This proxy server will be removed: 80.252.17.220:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 45 of 55. This proxy server will be removed: 83.220.59.167:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 46 of 55. This proxy server will be removed: 78.27.144.149:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 47 of 55. This proxy server will be removed: 77.121.139.21:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 48 of 55. This proxy server will be removed: 77.90.196.198:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 49 of 55. This proxy server will be removed: 89.105.244.199:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 50 of 55. This proxy server will be removed: 213.159.240.120:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 51 of 55. This proxy server will be removed: 195.138.88.105:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 52 of 55. This proxy server will be removed: 78.110.167.180:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 53 of 55. This proxy server will be removed: 93.126.43.244:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 54 of 55. This proxy server will be removed: 31.3.230.18:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Validated proxy 55 of 55. This proxy server will be removed: 123.30.12.188:3128 (the following I/O error was encountered: The host did not accept the connection within timeout of 9000 ms)
Finished filtering proxy servers. We now have the following number of good proxy servers: 0
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_proxies_O.txt
Writing the proxy pool to the file: /opt/ssenterprise/proxies/ANONYMOUS_good-proxies_O.txt
There are currently no proxy servers in the pool.
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Filtering the proxy server pool.
Number of proxies to test: 1
Connection timeout is: 9
Validating proxy: 1 of 1
-------- This is where it hung. After 20 min it had not returned or produces any more output.
I'm trying this on another system as well to see if it errors out or hangs at the same spot.
Second system produced a stack dump:
Processing script: "COMPILE_GOOD_ANONYMOUS_PROXIES"
Filtering the proxy server pool.
Number of proxies to test: 1
Connection timeout is: 9
Validating proxy: 1 of 1
Validated proxy 1 of 1. This proxy server will be removed: null:-1 (null)
Exception
--- Type : java.lang.NullPointerException
--- Message : null
--- com.screenscraper.util.ProxyServer.equals(ProxyServer.java:427)
--- java.util.Vector.indexOf(Vector.java:361)
--- java.util.Vector.indexOf(Vector.java:335)
--- java.util.Vector.removeElement(Vector.java:594)
--- java.util.Vector.remove(Vector.java:745)
--- com.screenscraper.util.ProxyServerPool.removeProxyServer(ProxyServerPool.java:730)
--- com.screenscraper.util.ProxyServerPool.filter(ProxyServerPool.java:1713)
--- com.screenscraper.util.ProxyServerPool.filter(ProxyServerPool.java:1657)
--- sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
--- sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
--- sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
--- java.lang.reflect.Method.invoke(Method.java:597)
--- bsh.Reflect.invokeMethod(Unknown Source)
--- bsh.Reflect.invokeObjectMethod(Unknown Source)
--- bsh.Name.invokeMethod(Unknown Source)
--- bsh.BSHMethodInvocation.eval(Unknown Source)
--- bsh.BSHPrimaryExpression.eval(Unknown Source)
--- bsh.BSHPrimaryExpression.eval(Unknown Source)
--- bsh.BSHBlock.evalBlock(Unknown Source)
--- bsh.BSHBlock.eval(Unknown Source)
--- bsh.BSHBlock.eval(Unknown Source)
--- bsh.BSHTryStatement.eval(Unknown Source)
--- bsh.Interpreter.eval(Unknown Source)
--- bsh.Interpreter.eval(Unknown Source)
--- bsh.Interpreter.eval(Unknown Source)
--- com.screenscraper.scraper.ScriptContext$ScriptRunner.run(ScriptContext.java:353)
--- java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
--- java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
--- java.util.concurrent.FutureTask.run(FutureTask.java:138)
--- java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
--- java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
--- java.lang.Thread.run(Thread.java:680)
Processing script: "TEST&WRITE_TO_"US_GOOD_PROXIES_O""
Filtering the proxy server pool.
Number of proxies to test: 23
Connection timeout is: 9
Validating proxy: 1 of 23
Validating proxy: 2 of 23
Validating proxy: 3 of 23
Thanks for pointing out that
Thanks for pointing out that bug. It looks like it's the same error I found, and should be fixed in the next alpha.
I don't see anything that should cause screen-scraper to hang when validating proxies, but it is possible that the proxy was responding with data slowly. The timeout value you use of 9 seconds is set on the socket. Screen-scraper requires a connection to be established through the proxy within that amount of time, and then that data is received back within the timeout value. In theory, if the proxy responded at a rate of 1 byte every 9 seconds, the connection wouldn't timeout and the proxy would show as valid, even if it took 20 minutes to get the content from the server.
I haven't seen a time when screen-scraper hung on a single proxy for more than a minute or so before timing out or completing the request, but that is the only reason I can currently think of that would explain why it hung for you this time. If you continue to see this problem please let us know so we can look into it further.