Issue of encoded url parameters even when adding to url in properties tab
One more issue w.r.t get parameters in a url.
Referring to this posting titled
url variables automatically hex encoding - how can i get it to stop?
We have to send a parameter to a website in the following format
somewebsite.com/someaction.do?RIPSESSION={[*!1D1A020B050
I add this parameter to the url on the properties tab,
but in the logs i am observing it gets converted to
somewebsite.com/someaction.do?RIPSESSION=%7B%5B*!1D1A020B050,
Is there any way to avoid the encoding of {[ charatcers as it is causing a no response
returned from the website.
Appreciate your response and support- Dipti
Anytime you use the parameter
Anytime you use the parameter tab you will get things encoded like that. You will need to set the parameters using a script.
http://community.screen-scraper.com/API/addHTTPParameter
Issue persists even after using a script to send get parameter
Used a script to pass the get parameter
Get variable rip session {[*!1D1A020 gets converted to %7B%5B*%211D1A020
(As observed in the logs)
Any pointers- Appreciate your support- Dipti
Hi, I did some investigating
Hi,
I did some investigating on this one, and it turns out that there are certain characters that are illegal in a URL. When you use these characters in your web browser, the browser must simply handle encoding them behind the scenes. Within screen-scraper we use an HTTP library called HttpClient, which tends to conform to the HTTP specification relatively strictly. We've found ourselves having to do things from time to time to make it a bit more forgiving. This is one such case. In screen-scraper, if you embed characters like square brackets (i.e., []) in a ULR, internally we actually replace those with their encoded equivalents. If we don't HttpClient generates an error and disallows the request. In fact, you can check section 2.2 through 2.4.3 of the HTTP spec for details on what characters are disallowed: http://www.faqs.org/rfcs/rfc2396.html.
Using the escaped characters *should* be equivalent to using the actual characters. Have you verified that the difficulty you've found in scraping that page isn't due to something else, such as a missing cookie?
Kind regards,
Todd
Keeping the space (or %20) in the URL
I am scraping a site which happens to be an online document repository (a secure type of dropbox). I am recursively iterating through the folders, and as I scrape I collect the URL for a child folder and store it in a session viable called HREF. The is already encoded so I use URLDecoder.decode(session.getv("HREF"),"UTF-8") when I am parsing the full URL to be scraped, saving that URL to another session variable which is then used by the scrapeable file.
My session log shows me the decoded URL looks like:
URL decoded: https://XXXXX/Served Evidence/
But when SS requests the URL the log shows:
Requesting URL: https://XXXXX/Served+Evidence/
It has replaced the space with a +, to the site throws a 404 error. This is replicated if I manually paste the url in a browser window – with the space in place it works, with the + it doesn’t.
Is there a way I can stop SS from replacing the space with the +?
I have tried everything but I think the HTTPClient is doing this and I can't find an override.
On the scraping session >
On the scraping session > advanced tab, there is 4 HTTP clients. Async v2 and Ning uses the "+". The old Async uses %20, but it's somewhat unreliable on HTTPS sites.
the cURL uses %20, and you can see how to set it up here: https://support.screen-scraper.com/node/2554