Results in SS different than Browser
Until recently my SS Session worked. But now the Scrapable file won't find the extractor pattern. When I view the results in the SS Window, the dataset isn't empty but it IS missing the important information I need. When I open the Response in a web browser, i notice that it redirects to the data I need and in that source code the data is found.
Any idea why SS cannot see the same results as the web browser?
-----
The url I'm requesting is this:
https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results
Only in the browser it redirects to this URL with all the info in it:
https://jeffersonpva.ky.gov/property-search/property-details/?StrtNum=123&Single=1&lrsn=8002423
The data I'm trying to extract is the 8002423 Number below (only seen with web browser):
When you go
When you go to
https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results
It returns HTML. In there is a tag
<meta refresh ...>
. Screen-scraper doesn't run that, so you need to scrape the target URL, save it as a session variable, and pass it as the URL for a new scrapeable file.Sooo complicated!
"so you need to scrape the target URL, save it as a session variable, and pass it as the URL for a new scrapeable file." This is what I think you mean but it isn't working so thank you for your patience.
A: Scrape the target URL:
1) Scrapeable file (called "Search") searches using the URL of: https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results
2) We add your script "After file is scraped" called "getCurrentURL". We do this in the properties tab not the "extractor patterns" tab because there are no pattern matches in the body of the website to extract, right? I also add my "Details" script that calls my "details" scrapable file to run in sequence 2.
B: Save it as a session variable
3) The script runs and saves the session variable I need Somehow? (the URL + variable). (~#PROPERTY_ID#~)
import org.apache.commons.lang3.StringUtils;
String url = scrapeableFile.getCurrentURL();
{
String paramStr = StringUtils.substringAfter(url, "?");
String[] params = paramStr.split("&");
for(int i=0; i
String param = params[i];
log.log("---" + param);
if(param.startsWith("lrsn"))
{
String val = StringUtils.substringAfter(param, "=");
log.log("-----Isolated value: " + val);
}
}
C: pass it as the URL for a new scrapeable file
4) My second script runs named "Details" after it is called by my "search" scrapable file in sequence 2. This script runs my "Details" Scrapeable file. That file uses the session variable "property_ID" from Step 3 to search the URL: https://jeffersonpva.ky.gov/property-search/property-details/?StrtNum=123&Single=1&lrsn=~#Property_ID#~
5) I can scrape the results of that screen, right?
I know these steps aren't right because it doesn't work. But what changes do I make to fix it? It's frustrating that the company I'm scraping from made that tiny change and messed it all up! :)
Scrapable file goes to
OOPS, I think I wasn't clear
OOPS, I think I wasn't clear above. That 'meta refresh...' tag doesn't show up anymore within screen-scraper. It WAS there until recently and it IS there when I scrape and display the response in a in BROWSER. However, when i display the response in screen scraper it is missing the meta refresh from its results. It's not because it's truncated.
View in Browser:
<div class="site-content print-block" role="main">
<input type="hidden" id="lrsn" value="8002423" />
<input type="hidden" id="neighborhood-id" value="11" />
Result Viewed in Screen Scraper:
<div id="content" class="site-content" role="main">
<div class="visible-desktop">
<div id="searchTabsListing" class="search-tabs">
Since I couldn't figure why it was happening, I was asking how to grab it from the redirect URL instead, because what i need is in there too. I'd much rather know why it's not showing up in screen scraper.
So my question is either:
1) Why can't I see the meta refresh tag within screen scraper?
2) How do I save part of a redirected URL and save it as a variable?
I can pay someone to fix this for me, i just need it done :)
I attached Jefferson County
I attached Jefferson County Redir scrape for you to examine.
I was able to get it working
I was able to get it working using your Attachment! Thank you!!
PS. How do you stop SS from truncating results? that may have helped...
Stop screen-scraper Find the
MaximumDisplayedLastResponseLength=
and delete itThanks! I'll give it a shot
Thanks! I'll give it a shot