Results in SS different than Browser

Until recently my SS Session worked. But now the Scrapable file won't find the extractor pattern. When I view the results in the SS Window, the dataset isn't empty but it IS missing the important information I need. When I open the Response in a web browser, i notice that it redirects to the data I need and in that source code the data is found.

Any idea why SS cannot see the same results as the web browser?

-----
The url I'm requesting is this:
https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results

Only in the browser it redirects to this URL with all the info in it:
https://jeffersonpva.ky.gov/property-search/property-details/?StrtNum=123&Single=1&lrsn=8002423

The data I'm trying to extract is the 8002423 Number below (only seen with web browser):

jeffcole on 10/06/2020 at 2:14 pm

screen-scraper support for licensed users

When you go

When you go to

https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results

It returns HTML. In there is a tag <meta refresh ...>. Screen-scraper doesn't run that, so you need to scrape the target URL, save it as a session variable, and pass it as the URL for a new scrapeable file.

jason on 10/07/2020 at 10:10 am

Sooo complicated!

"so you need to scrape the target URL, save it as a session variable, and pass it as the URL for a new scrapeable file." This is what I think you mean but it isn't working so thank you for your patience.

A: Scrape the target URL:

1) Scrapeable file (called "Search") searches using the URL of: https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results
2) We add your script "After file is scraped" called "getCurrentURL". We do this in the properties tab not the "extractor patterns" tab because there are no pattern matches in the body of the website to extract, right? I also add my "Details" script that calls my "details" scrapable file to run in sequence 2.

B: Save it as a session variable

3) The script runs and saves the session variable I need Somehow? (the URL + variable). (~#PROPERTY_ID#~)

import org.apache.commons.lang3.StringUtils;

String url = scrapeableFile.getCurrentURL();
String paramStr = StringUtils.substringAfter(url, "?");
String[] params = paramStr.split("&");
for(int i=0; i {
String param = params[i];
log.log("---" + param);
if(param.startsWith("lrsn"))
{
String val = StringUtils.substringAfter(param, "=");
log.log("-----Isolated value: " + val);
}
}

C: pass it as the URL for a new scrapeable file

4) My second script runs named "Details" after it is called by my "search" scrapable file in sequence 2. This script runs my "Details" Scrapeable file. That file uses the session variable "property_ID" from Step 3 to search the URL: https://jeffersonpva.ky.gov/property-search/property-details/?StrtNum=123&Single=1&lrsn=~#Property_ID#~
5) I can scrape the results of that screen, right?

I know these steps aren't right because it doesn't work. But what changes do I make to fix it? It's frustrating that the company I'm scraping from made that tiny change and messed it all up! :)

jeffcole on 10/08/2020 at 11:20 am

Scrapable file goes to

Scrapable file goes to https://jeffersonpva.ky.gov/property-search/property-listings/?psfldAddress=123+Main+St&propertySearchFormButton=Search&searchType=StreetSearch#results
Scrape the new address from the meta/refresh tag, save it as a session variable.
Scrape a new scrapeable file, the URL put the session variable you used before. If the session variable is "URL_REDIR", the URL line on the new file shows ~#URL_REDIR#~
The the content of the new page

jason on 10/12/2020 at 5:18 pm

OOPS, I think I wasn't clear

OOPS, I think I wasn't clear above. That 'meta refresh...' tag doesn't show up anymore within screen-scraper. It WAS there until recently and it IS there when I scrape and display the response in a in BROWSER. However, when i display the response in screen scraper it is missing the meta refresh from its results. It's not because it's truncated.

View in Browser:

Result Viewed in Screen Scraper:

Since I couldn't figure why it was happening, I was asking how to grab it from the redirect URL instead, because what i need is in there too. I'd much rather know why it's not showing up in screen scraper.

So my question is either:
1) Why can't I see the meta refresh tag within screen scraper?
2) How do I save part of a redirected URL and save it as a variable?

I can pay someone to fix this for me, i just need it done :)

jeffcole on 10/15/2020 at 10:02 am

I attached Jefferson County

I attached Jefferson County Redir scrape for you to examine.

jason on 10/16/2020 at 2:08 pm

I was able to get it working

I was able to get it working using your Attachment! Thank you!!

PS. How do you stop SS from truncating results? that may have helped...

jeffcole on 10/16/2020 at 3:58 pm

Stop screen-scraper Find the

Stop screen-scraper
Find the file screen-scraper.properties in screen-scraper install dir/resource/conf and edit it.
Find the line MaximumDisplayedLastResponseLength= and delete it
Save and close

jason on 10/19/2020 at 9:05 am

Thanks! I'll give it a shot

jeffcole on 10/16/2020 at 3:08 pm

Search

Community

screen-scraper

User login