Issue of encoded url parameters even when adding to url in properties tab

One more issue w.r.t get parameters in a url.

Referring to this posting titled

url variables automatically hex encoding - how can i get it to stop?

We have to send a parameter to a website in the following format

somewebsite.com/someaction.do?RIPSESSION={[*!1D1A020B050

I add this parameter to the url on the properties tab,
but in the logs i am observing it gets converted to

somewebsite.com/someaction.do?RIPSESSION=%7B%5B*!1D1A020B050,

Is there any way to avoid the encoding of {[ charatcers as it is causing a no response
returned from the website.

Appreciate your response and support- Dipti

diptirmaya on 12/07/2009 at 4:19 am

screen-scraper support for licensed users

Anytime you use the parameter

Anytime you use the parameter tab you will get things encoded like that. You will need to set the parameters using a script.

http://community.screen-scraper.com/API/addHTTPParameter

jason on 12/07/2009 at 10:51 am

Issue persists even after using a script to send get parameter

Used a script to pass the get parameter

Get variable rip session {[*!1D1A020 gets converted to %7B%5B*%211D1A020

(As observed in the logs)

Any pointers- Appreciate your support- Dipti

diptirmaya on 12/09/2009 at 10:46 pm

I did some investigating on this one, and it turns out that there are certain characters that are illegal in a URL. When you use these characters in your web browser, the browser must simply handle encoding them behind the scenes. Within screen-scraper we use an HTTP library called HttpClient, which tends to conform to the HTTP specification relatively strictly. We've found ourselves having to do things from time to time to make it a bit more forgiving. This is one such case. In screen-scraper, if you embed characters like square brackets (i.e., []) in a ULR, internally we actually replace those with their encoded equivalents. If we don't HttpClient generates an error and disallows the request. In fact, you can check section 2.2 through 2.4.3 of the HTTP spec for details on what characters are disallowed: http://www.faqs.org/rfcs/rfc2396.html.

Using the escaped characters *should* be equivalent to using the actual characters. Have you verified that the difficulty you've found in scraping that page isn't due to something else, such as a missing cookie?

Kind regards,

Todd

todd on 12/21/2009 at 5:01 pm

Keeping the space (or %20) in the URL

I am scraping a site which happens to be an online document repository (a secure type of dropbox). I am recursively iterating through the folders, and as I scrape I collect the URL for a child folder and store it in a session viable called HREF. The is already encoded so I use URLDecoder.decode(session.getv("HREF"),"UTF-8") when I am parsing the full URL to be scraped, saving that URL to another session variable which is then used by the scrapeable file.

My session log shows me the decoded URL looks like:

URL decoded: https://XXXXX/Served Evidence/

But when SS requests the URL the log shows:

Requesting URL: https://XXXXX/Served+Evidence/

It has replaced the space with a +, to the site throws a 404 error. This is replicated if I manually paste the url in a browser window – with the space in place it works, with the + it doesn’t.

Is there a way I can stop SS from replacing the space with the +?

I have tried everything but I think the HTTPClient is doing this and I can't find an override.

robobrief on 06/27/2020 at 1:34 pm

On the scraping session >

On the scraping session > advanced tab, there is 4 HTTP clients. Async v2 and Ning uses the "+". The old Async uses %20, but it's somewhat unreliable on HTTPS sites.

the cURL uses %20, and you can see how to set it up here: https://support.screen-scraper.com/node/2554

jason on 07/01/2020 at 10:43 am

Search

Community

screen-scraper

User login

Issue of encoded url parameters even when adding to url in properties tab

Anytime you use the parameter

Issue persists even after using a script to send get parameter

Hi, I did some investigating

Keeping the space (or %20) in the URL

On the scraping session >