How do I resolve connection issues when trying to scrape a site that uses SSL?
SSL issues can be manifest as a number of errors including but not limited to:
SSLHandshakeException
ssl_error_rx_record_too_long
An input/output error occurred while connecting to https:// … The message was peer not authenticated.
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
If you make a request and get one of these errors, the best steps to rectify it are
- Make sure you are using the newest version of screen-scraper
- Screen-scraper version 7 was released mostly to help deal with SSL issues. You need to be at version 7, or you can use an alpha version
- Install the Java Cryptography Extension
- Some countries have laws limiting the complexity of cryptography allowed, therefore the JRE is distributed honoring that limit. You can install the unlimited level from here. Once you have it, you will need to stop screen-scraper, unzip the file, and place the files in the jre/lib/security directory. It will overwrite the 2 exisiting files.
- Set the EnableSNIExtension
- This is a global change for all scrapes run in this instance of screen-scraper, so it's possible that it will correct one scrape and hamper another, but I've not seen that happen. You need to stop screen-scraper, edit the resource/conf/screen-scraper.properties file, and find the line EnableSNIExtension. If it's there, set to true, and if not add
EnableSNIExtension=true