How to scrape a two-part response
I'm trying to scrape annuity fund values from the following site:
https://www.jackson.com/annuities/variable/PerspectiveII.jsp?subNav=auv&framework-guid=a600d3302b25cabc31420115004c003b
After opening this site, click on "Daily Unit Value"
The SS proxy saves the requests and responses, including the tables of values, but when I use the transactions to generate scrapeable files and run a scraping session, I only get the first half of the response (which is a "Please Wait"), but not the tables of values in the responses.
Where do I go from here?
Looks like it cannot be done using Screen Scraper Basic edition
Scott,
Obviously you and I are using different versions of Screen Scraper. I am running Screen Scraper Basic Edition v5.5. I don't have the "Filter out less useful transactions" box or the "Compare with proxy transaction" button, and when I add the script file you prescribed, I get the following error:
An error occurred while processing the script: Add Header
The error message was: Exception (line 1): DUV: The "addHTTPHeader" method is not available in this edition of screen-scraper.-- Method Invocation scrapeableFile.addHTTPHeader
It appears this data cannot be scraped with the Basic Edition, and since I am working on something as a hobby, I am not willing to shell out $549 for he Professional version.
Tim
Tim, I'm sorry for not
Tim,
I'm sorry for not pointing that out. It would require at least our Professional Edition in order to call the addHTTPHeader method.
You may consider finding an alternate source for the data. Something like AnnuityAdvantage.com. No guarantee this site won't require features not available in the Basic Edition but it's at least an alternative.
-Scott
Tim, A quick way to find
Tim,
A quick way to find which proxy transaction contains the content you're after is to click the "Find..." button and perform a search for data that you know to be unique for that particular transaction.
In your case you may want to search a specific value under the "Daily % Change" that you know should not exist in the content for the "Performance" data.
Also, under the Progress tab of your proxy session try checking the box that says "Filter out less useful transactions". After doing so you should see many fewer transactions.
Of the transactions you now see you should see at least two that look like this.
https://www.jackson.com/annuities/variable/PerspectiveII.jsp?productID=105&StateVariation=nonNewYork
One of these transactions appeared when you clicked the "Daily Unit Value" link and this is where you will find the data for that link.
Hope this helps,
Scott
I find the correct request, but the response comes in 2 stages.
I can identify the proper request in the proxy session, but when I generate and run the corresponding scrapeable file, the recorded response includes the phrase "Please wait...", but not the table of data. The data arrives in a later response. How do I capture this later response for scraping?
Tim, Are you not able to see
Tim,
Are you not able to see the data in any of the responses? Choose a random figure from the website and do a search in your proxy transactions for it. You should be picking up the data.
Here's a sample of what the data looks like for me.
<td class="prFundName">JNL/Mellon Capital Mgt. Global Alpha<sup>20, 37, 39, 40, 42, 60</sup></td><td class="prFundName">Absolute Return</td><td class="prFundNumber">-0.001052</td><td class="prFundNumber">-0.0102%</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">10.243818</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">10.244870</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">10.225688</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">10.265106</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">10.226388</td>
</tr>
<tr style="background-Color:#f8fbfc;height:8ex;">
<td class="prFundName">JNL/Capital Guardian Global Diversified Research<sup>20, 27, 31</sup></td><td class="prFundName">Aggr. Growth</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">0.597890</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">2.3913%</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">25.600337</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">25.002447</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.622190</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.992610</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.843320</td>
</tr>
<tr style="background-Color:#f8fbfc;height:8ex;">
<td class="prFundName">JNL/Capital Guardian U.S. Growth Equity<sup>2, 20, 27, 31</sup></td><td class="prFundName">Aggr. Growth</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">0.729847</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">2.9468%</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">25.497142</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.767295</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.122557</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.582758</td><td class="prFundNumber" style="color: #006600;font-weight:bold;">24.260785</td>
</tr>
-Scott
I see the proper responses in the proxy session...
I see the proper response in the proxy session, but when I click on "Last Response" after running the generated scrapeable file in a scraping session, then clicking "Display Response in Browser", I see the "please wait..." screen, not the data tables. I have not set up any extractor patterns as of yet.
Tim, It makes sense that you
Tim,
It makes sense that you would see the "Please wait..." when viewing the last response HTML in your browser. There is a bit of Javascript that is called when the page loads that generates the "Please wait..." message. However, underneath all of that is the data that you're after.
So, I recommend that you work within the HTML of last response in order to create your extractor pattern tokens. Simply highlight the block of HTML that surrounds the data you want, right click and choose "Generate Extractor Pattern from highlighted text."
If you need a refresher on the proper procedure I would suggest going back over our tutorials.
-Scott
Here's my step-by-step:
No Joy ... Here's what I'm doing step by step:
Create new proxy setting named JNL Proxy
Using IE, set IE Internet optiuons to use localhost proxy & start SS Proxy server
In IE, use URL:
https://www.jackson.com/annuities/variable/PerspectiveII.jsp?subNav=auv&framework-guid=a600d3302b25cabc31420115004c003b
Response is "Problem with website's security Certificate" click "continue"
Label first transaction in Proxy session "Init Rqst"
In IE, click "Daily Unit Value"
Label This transaction "DUV"
--(Notice desired data table is included in this response)
Stop Proxy session & change Internet options Proxy back to none.
Create new Scraping Session "JNL Scrape"
In Proxy session window, highlight "Inir Rqst", and click "generate scrapeable file in:" -- "JNL Scrape"
In Proxy session window, highlight "DUV", and click "generate scrapeable file in:" -- "JNL Scrape"
Run "JNL Scrape" Scraping Session.
Click on DUV Scrapeable file, click "Last Response", then "Display Response in Browser"
In Browser, "View Source"
Search for "prFundName" - not found.
I tried the same thing again, but this time included all the proxy transactions that refer to Jackson.com. (Some of the transactions refer to notify4.dropbox.com:80, which I assume is unrelated to this app.) There were two transactions bewteen Init Rqst and DUV, which I labeled A and B, and included in a new scraping session. This new scraping session runs all the transactions in order, but ends with the same results.
What am I missing?
Tim, Try to resist the urge
Tim,
Try to resist the urge to "Display Response in Browser". Instead look under the Last Response tab for the data you're after. When you find it there, you can highlight the block of text that contains the data you're after, right-click it and choose "Generate extractor pattern from selected text".
For example:
<td class="prFundName">JNL/Mellon Capital Mgt. Global Alpha<sup>20, 37, 39, 40, 42, 60</sup></td>
<td class="prFundName">Absolute Return</td>
<td class="prFundNumber">-0.001052</td>
<td class="prFundNumber">-0.0102%</td>
<td class="prFundNumber" style="color: #006600;font-weight:bold;">10.243818</td>
<td class="prFundNumber" style="color: #006600;font-weight:bold;">10.244870</td>
<td class="prFundNumber" style="color: #006600;font-weight:bold;">10.225688</td>
<td class="prFundNumber" style="color: #006600;font-weight:bold;">10.265106</td>
<td class="prFundNumber" style="color: #006600;font-weight:bold;">10.226388</td>
</tr>
-Scott
It doesn't matter which screen I use, it isn't there.
Scott, It's not in the "Last Response" of the last transaction of the scraping session. I used the "Display Response in Browser" to be sure. In either case, I can search for "prFundName". It is there in the proxy session Response, but not in the scraping session response. Take a look at my step-by-step to see if anything has been forgotten....
I'm using Screen-Scraper basic edition v5.5
Thanks for working with me on this,
Tim
Tim, I apologize. In order
Tim,
I apologize. In order to see the content you will need to create a new script and put the following in it.
Then, call that script from the scrapeableFile "Before file is scraped".
I discovered this by using the "Compare with proxy transaction" button available under the Last Request tab.
Give that a try and sorry for leading you astray for a minute there.
-Scott