The script throwing error - The message was java.net.ConnectException: https://xxx:443

We were trying to capture one https site, its throwing the following error
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Landing Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443

We are using 6.0.65a version of screen scrapper tool.
We have selected http client as Async http client. Still it is throwing the error.

Please let us know what is the reason behind this

Regards
Barnali

barnali on 09/29/2015 at 3:04 am

screen-scraper support for licensed users

Can you add this line to a

Can you add this line to a script, test the scrape, and send me the log?

log.logScreenScraperInformation();

jason on 09/29/2015 at 9:34 am

log for your reference

Starting scraper.
Running scraping session: Halton_Scrapping_session
Processing scripts before scraping session begins.
Scraping file: "Landing Page"
Landing Page: Processing scripts before a file is scraped.
Processing script: "Halton - Init Script"
=================== Log Variables with Message ===============
screen-scraper Instance Information
=================== Variables being monitored ================
=================== Static Values ================
Java Vendor: Oracle Corporation
Java Version: 1.8.0_40
OS Architecture: x86
OS Name: Windows 7
OS Version: 6.1
SS Connection Timeout: 1000 seconds
SS Edition: Enterprise
SS Extractor Timeout: 30000 milliseconds
SS Max Concurrent Scraping Sessions: 5
SS Maximum Memory: 256 MB
SS Memory Use: 23%
SS Run Mode: Workbench
SS Version: 6.0.65a
======== Message logged at: 09/30/2015 14:39:07.231 IST ========
Landing Page: Requesting URL: https://webapp.halton.gov.uk/PlanningApps/index.asp
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Landing Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443.
Landing Page: Processing scripts after a file is scraped.
Scraping file: "Search Results Page"
Search Results Page: Processing scripts before a file is scraped.
Search Results Page: Requesting URL: https://webapp.halton.gov.uk/PlanningApps/index.asp
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Search Results Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443.
Search Results Page: Processing scripts before all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "BLOCK" has no regular expression.
Search Results Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "TOTAL_PAGES" has no regular expression.
The token "NUMBER_MATCHES" has no regular expression.
The token "RECORDS_PER_PAGE" has no regular expression.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "CURRENT_RECORD" has no regular expression.
Search Results Page: Processing scripts before all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Search Results Page: Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.
Search Results Page: Processing scripts after a file is scraped.
Processing script: "Halton - Loop For Data Script"
**************EXECUTION_STATUS ******* SUCCESS
**************SESSION_STATUS_CODE *******RECORDS_MATCHED
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "Halton_Scrapping_session" finished.

barnali on 09/30/2015 at 3:12 am

That is a error caused by a

That is a error caused by a configuration error on the site. Most apps ignore the handshake. Java does not.

If you exit screen-scraper, and edit the screen-scraper.properties, you can add a line

EnableSNIExtension=true

This is a global change, and may adversely affect another HTTPS sites. If you end up with some that need it set to true vs false, let us know.

You'll also want to go the the scraping session > advanced tab, and set the HTTP client to "Ning Async Http Client".

jason on 09/30/2015 at 12:24 pm

The change worked for this but failed for another HTTPS site

After changing the settings it worked for this Halton site but failed for another. Log below for your reference

Starting scraper.
Running scraping session: Gedling_Scraping_Session
Processing scripts before scraping session begins.
Scraping file: "Gedling - Landing Page"
Gedling - Landing Page: Processing scripts before a file is scraped.
Processing script: "Gedling - Init Script"
Debugging mode is enabled.
=================== Log Variables with Message ===============
screen-scraper Instance Information
=================== Variables being monitored ================
=================== Static Values ================
Java Vendor: Oracle Corporation
Java Version: 1.8.0_40
OS Architecture: x86
OS Name: Windows 7
OS Version: 6.1
SS Connection Timeout: 1000 seconds
SS Edition: Enterprise
SS Extractor Timeout: 30000 milliseconds
SS Max Concurrent Scraping Sessions: 5
SS Maximum Memory: 256 MB
SS Memory Use: 21%
SS Run Mode: Workbench
SS Version: 6.0.65a
======== Message logged at: 10/01/2015 09:02:13.301 IST ========
Gedling - Landing Page: Requesting URL: https://pawam.gedling.gov.uk/online-applications/search.do?action=advanced&searchType=Application
Encountered a connection error for domain "pawam.gedling.gov.uk". Message was "handshake alert: unrecognized_name". Trying different protocols...
Gedling - Landing Page: An input/output error occurred while connecting to 'https://pawam.gedling.gov.uk/online-applications/search.do'. The message was java.net.ConnectException: handshake alert: unrecognized_name.
Gedling - Landing Page: Processing scripts after a file is scraped.
Scraping file: "Gedling - Result Page"
Gedling - Result Page: Processing scripts before a file is scraped.
Gedling - Result Page: Requesting URL: https://pawam.gedling.gov.uk/online-applications/advancedSearchResults.do;jsessionid=AA9DD5F6525257A46AB4150053BDD2F2?action=firstPage
Encountered a connection error for domain "pawam.gedling.gov.uk". Message was "handshake alert: unrecognized_name". Trying different protocols...
Gedling - Result Page: An input/output error occurred while connecting to 'https://pawam.gedling.gov.uk/online-applications/advancedSearchResults.do;jsessionid=AA9DD5F6525257A46AB4150053BDD2F2'. The message was java.net.ConnectException: handshake alert: unrecognized_name.
Gedling - Result Page: Processing scripts before all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "APPLICATION_NUMBER" has no regular expression.
The token "REGISTRATION_OR_VALIDATED_DATE" has no regular expression.
The token "PROPOSAL" has no regular expression.
The token "ADDRESS" has no regular expression.
The token "DATE_RECEIVED" has no regular expression.
The token "APP_STATUS" has no regular expression.
The token "DAY" has no regular expression.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "ENDINDEX" has no regular expression.
The token "STARTINDEX" has no regular expression.
The token "NUMBER_MATCHES" has no regular expression.
Gedling - Result Page: Processing scripts before all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "data1" has no regular expression.
The token "data" has no regular expression.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Gedling - Result Page: Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.
Gedling - Result Page: Processing scripts after a file is scraped.
Processing script: "Gedling - Set Pagination Data For Loop Pages"
Processing script: "Gedling - Loop For Pages Script"
Pages>>>>>0
Inside Pages>>>>>
**************recordsCollectedCounter *******0
**************totalNumberOfRecords ******* 0
**************EXECUTION_STATUS ******* SUCCESS
**************SESSION_STATUS_CODE *******RECORDS_MATCHED
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "Gedling_Scraping_Session" finished.

barnali on 09/30/2015 at 9:35 pm

That site needs me to use the

That site needs me to use the Async client, and the EnableSNIExtension needs to be false, so these two sites won't work in the same install of screen scraper. We're working on a fix, but in the meantime, you may need to run two installations.

http://community.screen-scraper.com/MultipleInstances

jason on 10/01/2015 at 3:13 pm

https://webapp.halton.gov.uk/PlanningApps/index.asp

Regards
Barnali

barnali on 01/11/2016 at 12:55 am

I do not yet have a change

I do not yet have a change that allows these to both run on the same instance.

jason on 01/11/2016 at 7:34 am

Search

Community

screen-scraper

User login