The script throwing error - The message was java.net.ConnectException: https://xxx:443
We were trying to capture one https site, its throwing the following error
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Landing Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443
We are using 6.0.65a version of screen scrapper tool.
We have selected http client as Async http client. Still it is throwing the error.
Please let us know what is the reason behind this
Regards
Barnali
Can you add this line to a
Can you add this line to a script, test the scrape, and send me the log?
log for your reference
Starting scraper.
Running scraping session: Halton_Scrapping_session
Processing scripts before scraping session begins.
Scraping file: "Landing Page"
Landing Page: Processing scripts before a file is scraped.
Processing script: "Halton - Init Script"
=================== Log Variables with Message ===============
screen-scraper Instance Information
=================== Variables being monitored ================
=================== Static Values ================
Java Vendor: Oracle Corporation
Java Version: 1.8.0_40
OS Architecture: x86
OS Name: Windows 7
OS Version: 6.1
SS Connection Timeout: 1000 seconds
SS Edition: Enterprise
SS Extractor Timeout: 30000 milliseconds
SS Max Concurrent Scraping Sessions: 5
SS Maximum Memory: 256 MB
SS Memory Use: 23%
SS Run Mode: Workbench
SS Version: 6.0.65a
======== Message logged at: 09/30/2015 14:39:07.231 IST ========
Landing Page: Requesting URL: https://webapp.halton.gov.uk/PlanningApps/index.asp
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Landing Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443.
Landing Page: Processing scripts after a file is scraped.
Scraping file: "Search Results Page"
Search Results Page: Processing scripts before a file is scraped.
Search Results Page: Requesting URL: https://webapp.halton.gov.uk/PlanningApps/index.asp
Encountered a connection error for domain "webapp.halton.gov.uk". Message was "https://webapp.halton.gov.uk:443". Trying different protocols...
Search Results Page: An input/output error occurred while connecting to 'https://webapp.halton.gov.uk/PlanningApps/index.asp'. The message was java.net.ConnectException: https://webapp.halton.gov.uk:443.
Search Results Page: Processing scripts before all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "BLOCK" has no regular expression.
Search Results Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "TOTAL_PAGES" has no regular expression.
The token "NUMBER_MATCHES" has no regular expression.
The token "RECORDS_PER_PAGE" has no regular expression.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
The token "CURRENT_RECORD" has no regular expression.
Search Results Page: Processing scripts before all pattern applications.
Search Results Page: Extracting data for pattern "Untitled Extractor Pattern"
Search Results Page: The pattern did not find any matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Search Results Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Search Results Page: Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.
Search Results Page: Processing scripts after a file is scraped.
Processing script: "Halton - Loop For Data Script"
**************EXECUTION_STATUS ******* SUCCESS
**************SESSION_STATUS_CODE *******RECORDS_MATCHED
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "Halton_Scrapping_session" finished.
That is a error caused by a
That is a error caused by a configuration error on the site. Most apps ignore the handshake. Java does not.
If you exit screen-scraper, and edit the screen-scraper.properties, you can add a line
This is a global change, and may adversely affect another HTTPS sites. If you end up with some that need it set to true vs false, let us know.
You'll also want to go the the scraping session > advanced tab, and set the HTTP client to "Ning Async Http Client".
The change worked for this but failed for another HTTPS site
After changing the settings it worked for this Halton site but failed for another. Log below for your reference
Starting scraper.
Running scraping session: Gedling_Scraping_Session
Processing scripts before scraping session begins.
Scraping file: "Gedling - Landing Page"
Gedling - Landing Page: Processing scripts before a file is scraped.
Processing script: "Gedling - Init Script"
Debugging mode is enabled.
=================== Log Variables with Message ===============
screen-scraper Instance Information
=================== Variables being monitored ================
=================== Static Values ================
Java Vendor: Oracle Corporation
Java Version: 1.8.0_40
OS Architecture: x86
OS Name: Windows 7
OS Version: 6.1
SS Connection Timeout: 1000 seconds
SS Edition: Enterprise
SS Extractor Timeout: 30000 milliseconds
SS Max Concurrent Scraping Sessions: 5
SS Maximum Memory: 256 MB
SS Memory Use: 21%
SS Run Mode: Workbench
SS Version: 6.0.65a
======== Message logged at: 10/01/2015 09:02:13.301 IST ========
Gedling - Landing Page: Requesting URL: https://pawam.gedling.gov.uk/online-applications/search.do?action=advanced&searchType=Application
Encountered a connection error for domain "pawam.gedling.gov.uk". Message was "handshake alert: unrecognized_name". Trying different protocols...
Gedling - Landing Page: An input/output error occurred while connecting to 'https://pawam.gedling.gov.uk/online-applications/search.do'. The message was java.net.ConnectException: handshake alert: unrecognized_name.
Gedling - Landing Page: Processing scripts after a file is scraped.
Scraping file: "Gedling - Result Page"
Gedling - Result Page: Processing scripts before a file is scraped.
Gedling - Result Page: Requesting URL: https://pawam.gedling.gov.uk/online-applications/advancedSearchResults.do;jsessionid=AA9DD5F6525257A46AB4150053BDD2F2?action=firstPage
Encountered a connection error for domain "pawam.gedling.gov.uk". Message was "handshake alert: unrecognized_name". Trying different protocols...
Gedling - Result Page: An input/output error occurred while connecting to 'https://pawam.gedling.gov.uk/online-applications/advancedSearchResults.do;jsessionid=AA9DD5F6525257A46AB4150053BDD2F2'. The message was java.net.ConnectException: handshake alert: unrecognized_name.
Gedling - Result Page: Processing scripts before all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "APPLICATION_NUMBER" has no regular expression.
The token "REGISTRATION_OR_VALIDATED_DATE" has no regular expression.
The token "PROPOSAL" has no regular expression.
The token "ADDRESS" has no regular expression.
The token "DATE_RECEIVED" has no regular expression.
The token "APP_STATUS" has no regular expression.
The token "DAY" has no regular expression.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "ENDINDEX" has no regular expression.
The token "STARTINDEX" has no regular expression.
The token "NUMBER_MATCHES" has no regular expression.
Gedling - Result Page: Processing scripts before all pattern applications.
Gedling - Result Page: Extracting data for pattern "Untitled Extractor Pattern"
Gedling - Result Page: The pattern did not find any matches.
The token "data1" has no regular expression.
The token "data" has no regular expression.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts once if no matches.
Gedling - Result Page: Untitled Extractor Pattern: Processing scripts after all pattern applications.
Gedling - Result Page: Warning! No matches were made by any of the extractor patterns associated with this scrapeable file.
Gedling - Result Page: Processing scripts after a file is scraped.
Processing script: "Gedling - Set Pagination Data For Loop Pages"
Processing script: "Gedling - Loop For Pages Script"
Pages>>>>>0
Inside Pages>>>>>
**************recordsCollectedCounter *******0
**************totalNumberOfRecords ******* 0
**************EXECUTION_STATUS ******* SUCCESS
**************SESSION_STATUS_CODE *******RECORDS_MATCHED
Processing scripts after scraping session has ended.
Processing scripts always to be run at the end.
Scraping session "Gedling_Scraping_Session" finished.
That site needs me to use the
That site needs me to use the Async client, and the EnableSNIExtension needs to be false, so these two sites won't work in the same install of screen scraper. We're working on a fix, but in the meantime, you may need to run two installations.
http://community.screen-scraper.com/MultipleInstances
any update
Hi
any update on fix of this issue?
Regards,
Barnali
Waiting for the fix
Thank you Jason
Kindly let me know when the fix will be available.
Regards
Barnali
Any Update
To run into different instances we need manual intervention for database changes.
Please can you confirm that we can have only one instance which will run on both the sites
https://pawam.gedling.gov.uk/online-applications/search.do?action=advanced
https://webapp.halton.gov.uk/PlanningApps/index.asp
Regards
Barnali
I do not yet have a change
I do not yet have a change that allows these to both run on the same instance.