screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

New cURL HTTP client

Screen-scraper has several HTTP clients you can select for cases where a site doesn't work well with one of them. Recently we've seen a few websites that don't cooperate with any of those HTTP clients. No matter the HTTP client selected, you get errors relating to "Unable to connect" or "connection closed" (exact error varies per client.) We've therefore added the ability to use cURL.

Easy to scrape site - getting images looks like a real challenge!

Hi, hope you can help....

I am trying to get the details and images from this site, but finding the codes required to upload for the URL and the cookies/headers for the api to get the images has me totally stumped

https://www.cva-auctions.co.uk/auction/25

The Images api looks like this: https://br-api.aos.tv/vehicles/MT21XRK?cb=1748345775577 - I can get the MT21 part of the URL but no idea where the las part of the query string comes from. Until I get past this I can't test the header/cookie records etc.

Any help gratefully appreciated.
Thanks

Can't get past JS?

Hi, I am struggling to know where to start with this one. I have been playing around with the cookies, but have had no luck at all so far. Should I persevere? Is there a trick to these ones? Thanks, Jason
https://www.williamstruckcentre.com/inventory/?/listings/for-sale/equipment/all?AccountCRMID=9495595&dlr=1&settingscrmid=9495595

org.jboss.netty.handler.codec.frame.TooLongFrameException

We are getting an error related to the size of the headers which is coming through with
org.jboss.netty.handler.codec.frame.TooLongFrameException
the header length is 8192 bytes and possibly more.
is there a setting in properties for this that we need to adjust?

Or a solution in script that we could use if there any ideas?
Regards,
Sean

Bad target website programming...

Hi,

I have this site https://hopdeals.com/Vacuum-Tankers.html where it looks like the URL links have been added randomly and sometimes with spaces.
https://hopdeals.com/52-2017-Isuzu-Euro-6 7.5-Ton-Jet-Vac-for-sale.html

When Screen scraper tries to reformat the space it changes it to a + sign rather than a %20 which the browser expects. This results in a continual redirect and makes SS hang. I have tried changing the Jtidy and the token but it still does not work.
Any thoughts?
Thanks
Jason

Another next page problem

Hi
I am trying to get to all the pages of this site. (https://mvcommercial.com/main-search) I have viewed in developer tools and seen that the key JSON data is here:
https://opus.cdbl.site/website-api/get-vehicles?filtersList={"Condition":["New+&+Used"],"pageNumber":1}

I can get the pages of this API to change in Chrome but not in SS. I have tried to set the 2 cookies (probably incorrectly)
I have also tried to set headers (again incorrectly) but all I can ever seem to get is page 1.

Hope you can point me in the right direction. Thanks
Jason

Multipart Including text/html

Hi

I am using screen-scraper to create pages in various MS OneNote notebooks. It works fine where the content is solely text/html but I want to embed files into the page and to do that I need to post multipart content. I am using scrapeablefile.forceMultipart( true ) which does exactly what you’d expect. The problem I have is that OneNote requires a part with the key “Presentation” and the Content-Type explicitly set to text/html.

Easy way to get around Google’s captcha?

We’d like to scrape a site that requires completing google’s recaptcha once per search. We are OK having a user sit there and answer the challenge photo when it appears, then let screen-scraper do the rest.

Can this be done? If so do you have a tutorial? I could only find some older posts about captcha but nothing like what I’m requesting nor is it up to date.

Your help is greatly appreciated.

HELP! Can't Call From Command Line After Upgrading to 7.0.14a

Under Windows command line I would call a scraping session from here:
"C:\Program Files\screen-scraper Professional Edition\jre\bin\java" -jar screen-scraper.jar -s "SessionNameHere"
I have just upgraded and none of my scraping sessions will run (indeed that path no longer even seems to exist)
Has the upgrade failed or do I need a new command line?
Thanks as always
Jason

After rebooting this seems to have worked itself out, however the new version seems much slower than the old. I have a few scrapes that now give me a 502 error code as well which didn't happen prior to upgrade?

Creating header records

I am scraping this url but, although I can see the data in chrome tools and I can collect using curl I can't seem to get the headers to work in SS.

https://webapp.artisio.co/website/lots/?page=1&limit=100&sort=end_date&sort-by=asc&lot_status=published&auction_uuid=ad5917f5-c056-4037-92f0-7e57d8ecaa2d

These are the header records needed:
^-H "authority: webapp.artisio.co" ^
-H "accept: application/json, text/plain, */*" ^
-H "accept-language: en-GB,en;q=0.9" ^
-H "artisio-client-id: 45781238" ^
-H "artisio-language: en" ^
-H "cache-control: no-cache" ^