Proxy Server Having Problems With SSL
I've been having a lot of problems trying to scrape SSL webpages when using the proxy server. Sometimes I can circumvent the SSL by removing the "s" from "https" in the URL. Other times, I can find a way around it by using Google Chrome and its Inspect Element feature, which allows me to see the transactions and then I can hand build the GET request in screen scraper using the "addHTTPHeader" method to and have it run before the file is scraped. I've also used Fiddler, Charles Proxy, and Tamper Data (within Firefox) to find ways to build the scrape when the proxy server fails to work correctly. However, I recently ran into problems trying to scrape the location page for Dominos Pizza, and none of my work arounds have been successful.
Firefox cache is clear.
SSL state was cleared in internet options.
Proxy server is running in screen-scraper using port 8777.
I've also ensured that screen-scraper.com's certificate is within my "Intermediate Certification Authorities" within my Internet Option for Windows.
The webpage will load for the main page and the transactions show up in the proxy session, but when I attempt to move to the locations page, I get a warning in my browser that the security certificate isn't valid.
I add the certificate as an exemption, and click continue, and then I get an error message within my browser stating that "An error occurred during a connection to order.dominos.com. SSL received a record that exceeded the maximum permissible length. (Error code: ssl_error_rx_record_too_long)." This is not the first time I have seen this error message, but this is the first time I haven't figured out how to get around it.
The proxy server transaction log in screen-scraper lists the HTTPS Transaction for the locations page, but an attempt to run the scrape using that page results in an error message that reads "An input/output error occurred while connecting to 'http://order.dominos.com/en/pages/order/'. The message was peer not authenticated." There is no text in the Last Request or Last Response tab.
I've tried: clearing cookies before the scrape, refusing cookies, accepting all cookies, initializing the scrape with a main page not in HTTPS, and then moving to the HTTPS page afterward. I've also tried recreating the scrape session by running each relevant page one by one in order to simulate mouse clicks as best I can, and I've tried to search each and every transaction using Tamper Data to find the authenticating transaction, but I can't find it. Lastly, I also tried loading the webpage with the proxy server turned off, and then turning the proxy server on after the page has loaded, and refreshing the page using the browser cache, and that didn't work.
Would you be willing to give me a set of tasks I can do to ensure SSL transactions are handled within screen-scrapers proxy session correctly on my computer?
I have a co-worker that was able to successfully paste in a URL for the scrapeable file and get it to work within screen-scraper. He's using the same version of screen-scraper as I am. He found the URL by using Fiddler and Internet Explorer. If I paste the URL in my browser without the proxy on, it works. (https://order.dominos.com/power/store-locator?type=Locations&c=dunedin%2C+FL+33668&s=) but I still get the peer not authenticated error and so does another co-worker of mine if the proxy server is turned on. It appears that the co-worker has some sort of special authenticated status that neither of the other scraping computers have.
Thanks in advance for your help!
P.S. Another example of an SSL certificate giving me problems is looking at locations for www.saladworks.com
My system specs are: Windows 7, screen-scraper version 6.0.54a, Firefox 25.0.1., and Java 7 update 71.
The browser thinks that the
The browser thinks that the screen-scraper proxy is a man-in-the-middle, and it's technically correct. I've had the best luck with either
Still having issues....
Thanks Jason. I tried both of those options, but neither worked. We used to be able to turn the proxy server on, after clearing the cache, and start a fresh new session, validating the certificates as we go with the proxy server on, but that doesn't seem to work for these particular websites. One of the odd things that occurs is that turning off the proxy session within the browser immediately refreshes the locator automatically, and then the locator will work.
I have cases like that
I have cases like that sometimes, and the only solution is to use another proxy like Fiddler or Charles proxy.
A possible solution
Hi Jason. My co-worker seemed to stumble on a fix and I thought I would pass the information on. We aren't sure which one of the things we did actually fixed the problem, but you can probably figure it out from my play by play since you know more about how the proxy server works as well as browser certificates. Here is what I did. Once I completed these steps, I used a freshly installed version of Mozilla Firefox with the proxy server. I did receive a certificate warning in my browser, but after accepting the certificate, I was able to run the scrape session my co-worker built without receiving the "user not authenticated" error within the log of screen-scraper.
Configure Java - Under the Security tab, changed to medium security.
Under the Advanced tab:
"Perform certificate revocation checks on:" selected "Do not check (not recommended).
"Secure Execution Environment:" selected "Don't prompt for client certificate selection when no certificates or only one exists"
Unchecked - "Warn if site certificate does not match hostname"
In Internet Options in Windows 7 - Advanced tab.
unchecked "check for publishers certificate revocation"
unchecked "check for server certificate revocation"
checked "use SSL 2.0"
unchecked "warn about certificate address mismatch"
unchecked "warn if changing between secure and not secure mode"
unchecked "warn if POST submittal is redirected to a zone that does not permit posts"
checked "Empty Temporary Internet Files folder when browser is closed"
If you have any insight as to what particular thing probably fixed the problem, I'd really like to know what it was for future reference. THANK YOU!