Add header to curl request
Hi,
I am trying to connect to an url using the newly implemented http client curl.
From the logs i can see the following request is generated in screen-scraper:
curl "https://www.Electrolux.nl/"
--compressed
--insecure
--silent
--verbose
--get
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36"
-H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3"
-H "Accept-Language: en-us,en;q=0.9"
-H "Expect:"
-H "Accept-Encoding: gzip"
-H "Upgrade-Insecure-Requests: 1"
Unfortunately this does not work for this website. The header "Connection: keep-alive" should also be embedded in the request like this:
curl "https://www.Electrolux.nl/"
--compressed
--insecure
--silent
--verbose
--get
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36"
-H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3"
-H "Accept-Language: en-us,en;q=0.9"
-H "Expect:"
-H "Accept-Encoding: gzip"
-H "Connection: keep-alive"
-H "Upgrade-Insecure-Requests: 1"
How can i make screen-scraper add the option -H "Connection: keep-alive" to the curl command?
The cURL client still uses
The cURL client still uses the method here: https://support.screen-scraper.com/documentation/api/scrapeablefile/addhttpheader
This does not work as
This does not work as aspected. When i add a header like: scrapeableFile.addHTTPHeader( "--max-time","2" );
The log shows the following: Issuing cURL command: curl '--compressed' '--insecure' '--silent' '--verbose' '--get' '-H' 'Expect:' '-H' '--max-time: 2' ...
This is a wrong request. It should be: --max-time 2 not '-H' '--max-time: 2'
BTW for debugging purposes it would be nice if the log show the curl request like this:
Issuing cURL command: curl --compressed --insecure --silent --verbose --get --max-time 2 https://www.google.com -H "Accept-Language: en-us,en;q=0.9"
I think that it's just a
I think that it's just a matter of how different libraries display the headers.
I thin the header you'd set is:
The headers are key/value pairs, and one library might shoe them with a colon, and others without, but they are functionally the same. cURL shows those dashes, but they aren't really part of the required values.
I am not quite sure what you
I am not quite sure what you mean. However adding the max-time out parameter does not work whether i set it as:
scrapeableFile.addHTTPHeader("max-time", "2");
or as
scrapeableFile.addHTTPHeader("--max-time", "2");
in the screen-scraper log i see this:
Issuing cURL command: curl '--compressed' '--insecure' '--silent' '--verbose' '--get' '-H' 'Expect:' '-H' 'max-time: 2' '-H' 'Accept-Language: en-us,en;q=0.9' '-H' 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' '-H' 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' '-H' 'Accept-Encoding: gzip' '-H' 'Upgrade-Insecure-Requests: 1' 'https://www.aeg.nl/'
In this particular case if no max-time out is applied the curl request can take forever to finish. For some reason the domain www.aeg.nl does not respond in a timely fashion.
You likely just want to set
You likely just want to set the connection timeout in the screen-scraper settings.