Problem ? Redirect requested but followRedirects is disabled
Dear Todd,
I get blank response when i try to scrape following sites in "HttpClient mode"
[i]http://www.google.com
http://www.hepsiburada.com[/i]
if i scrape google site like "http://www.google.com/search?q=xxxx" the problem does not occurs.
Also this problem does not occurs in "Internet Explorer(Windows ony)" mode
When i build the error.log in debug mode, i seen the following line
"[i]Redirect requested but followRedirects is disabled[/i]"
Could you please help me to resolve this problem.
Please find below the debugged error.log file for "google"
Best Regards
Dogan Turkuler
Java version: 1.4.2_08
Java vendor: Sun Microsystems Inc.
Java class path: C:\Program Files\screen-scraper professional edition\screen-scraper.jar;C:\Program Files\screen-scraper professional edition\lax.jar;
Operating system name: Windows XP
Operating system architecture: x86
Operating system version: 5.1
SUN 1.42: SUN (DSA key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; X.509 certificates; JKS keystore; PKIX CertPathValidator; PKIX CertPathBuilder; LDAP, Collection CertStores)
SunJSSE 1.42: Sun JSSE provider(implements RSA Signatures, PKCS12, SunX509 key/trust factories, SSLv3, TLSv1)
SunRsaSign 1.42: SUN's provider for RSA signatures
SunJCE 1.42: SunJCE Provider (implements DES, Triple DES, AES, Blowfish, PBE, Diffie-Hellman, HMAC-MD5, HMAC-SHA1)
SunJGSS 1.0: Sun (Kerberos v5)
Set parameter http.useragent = Jakarta Commons-HttpClient/3.0
Set parameter http.protocol.version = HTTP/1.1
Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager
Set parameter http.protocol.cookie-policy = rfc2109
Set parameter http.protocol.element-charset = US-ASCII
Set parameter http.protocol.content-charset = ISO-8859-1
Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@2f0bd7
Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z]
Set parameter http.connection.timeout = 15000
SimpleHttpConnectionManager being used incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.
Set parameter http.socket.timeout = 15000
enter GetMethod(String)
Set parameter http.protocol.cookie-policy = rfc2109
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@493b65
enter HttpClient.executeMethod(HttpMethod)
enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
SimpleHttpConnectionManager being used incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.
Attempt number 1 to process request
enter HttpConnection.open()
Open connection to www.google.com:80
enter HttpMethodBase.execute(HttpState, HttpConnection)
enter HttpMethodBase.writeRequest(HttpState, HttpConnection)
enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection)
enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String)
>> "GET / HTTP/1.1[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection)
enter HttpMethodBase.addRequestHeaders(HttpState, HttpConnection)
enter HttpMethodBase.addUserAgentRequestHeaders(HttpState, HttpConnection)
enter HttpMethodBase.addHostRequestHeader(HttpState, HttpConnection)
Adding Host request header
enter HttpMethodBase.addCookieRequestHeader(HttpState, HttpConnection)
enter HttpState.getCookies()
enter CookieSpecBase.match(String, int, String, boolean, Cookie[])
enter HttpMethodBase.addProxyConnectionHeader(HttpState, HttpConnection)
>> "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Keep-Alive: 300[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Accept-Language: en-us, en;q=0.50[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Accept-Charset: ISO-8859-1, ISO-10646-1, utf-8;q=0.66, *;q=0.66[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Host: www.google.com[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
enter HttpConnection.writeLine()
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "[\r][\n]"
enter HttpConnection.flushRequestOutputStream()
enter HttpMethodBase.readResponse(HttpState, HttpConnection)
enter HttpMethodBase.readStatusLine(HttpState, HttpConnection)
enter HttpConnection.readLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
<< "HTTP/1.1 302 Found[\r][\n]"
enter HttpMethodBase.readResponseHeaders(HttpState,HttpConnection)
enter HttpConnection.getResponseInputStream()
enter HeaderParser.parseHeaders(InputStream, String)
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
<< "Location: http://www.google.com.tr/[\r][\n]"
<< "Cache-Control: private[\r][\n]"
<< "Set-Cookie: PREF=ID=07e5d9ee2d7074cf:TM=1162163349:LM=1162163349:S=33SLMaaVYA4SnC06; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com[\r][\n]"
<< "Content-Type: text/html[\r][\n]"
<< "Server: GWS/2.1[\r][\n]"
<< "Content-Length: 222[\r][\n]"
<< "Date: Sun, 29 Oct 2006 23:09:09 GMT[\r][\n]"
enter HttpMethodBase.processResponseHeaders(HttpState, HttpConnection)
enter CookieSpecBase.parse(String, port, path, boolean, String)
enter CookieSpecBase.parse(String, port, path, boolean, Header)
enter Cookie(String, String, String, String, Date, boolean)
enter RFC2109Spec.validate(String, int, String, boolean, Cookie)
enter CookieSpecBase.validate(String, port, path, boolean, Cookie)
enter HttpState.addCookie(Cookie)
enter RFC2109Spec.formatCookie(Cookie)
Cookie accepted: "$Version=0; PREF=ID=07e5d9ee2d7074cf:TM=1162163349:LM=1162163349:S=33SLMaaVYA4SnC06; $Path=/; $Domain=.google.com"
enter HttpMethodBase.readResponseBody(HttpState, HttpConnection)
enter HttpMethodBase.readResponseBody(HttpConnection)
enter HttpConnection.getResponseInputStream()
enter HttpMethodBase.canResponseHaveBody(int)
[color=red]Redirect required
Redirect requested but followRedirects is disabled[/color]
Buffering response body
<< "
<< "
<< "
302 Moved
[\n]"<< "The document has moved[\n]"
<< "here.[\r][\n]"
<< "[\r][\n]"
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
enter HttpConnection.isResponseAvailable()
enter HttpConnection.releaseConnection()
Releasing connection back to connection manager.
enter getContentCharSet( Header contentheader )
enter HeaderElement.parseElements(String)
enter HeaderElement.parseElements(char[])
enter HeaderElement.getParameterByName(String)
Default charset used: ISO-8859-1
The following error occurred: null
java.lang.NullPointerException
at com.screenscraper.scraper.ScrapeableFile.scrapeData(ScrapeableFile.java:3241)
at com.screenscraper.scraper.ScrapeableFile.scrape(ScrapeableFile.java:2071)
at com.screenscraper.scraper.ScrapingSession.scrapeFile(ScrapingSession.java:1975)
at com.screenscraper.scraper.Scraper.scrape(Scraper.java:200)
at com.screenscraper.scraper.Scraper.run(Scraper.java:110)
Problem ? Redirect requested but followRedirects is disabled
Hi Dogan,
Just to ensure we're on the same page, could you drop me an email so that I can send you the scraping session I'm using? It still seems to be working fine for me. My email address is my first name at screen-scraper.com.
Thanks,
Todd
Problem ? Redirect requested but followRedirects is disabled
Hello Todd
i reinstalled the application and upgraded to version ...19a
The problem still continues on httpclient mode
7 days ago i did not have this problem.
i tried the application in 5 diffferent computer (2 of them are virtual PCs). I get the same errors.
I am from Turkey, and our character set is cp1254 (may be it could be a clue)
i use two browser simultaneously
internet explorer with proxy 8777..
and maxthon for direct connection to sites without proxy
i dont use external proxy server for connection
i also tried the installing and uninstalling scr-scraper server. Nothing changed.
This is my scraping file:
----------------------------
-
-
-
-
following is screen-scraper.properties file;
-----------------------------------------------
#This file is manipulated by screen-scraper. Edit it manually at your own risk!
#Mon Oct 30 22:40:33 EET 2006
DefaultCharacterSet=UTF-8
ProxyForceAllHTTPRequestsToHTTPS=false
ExternalProxyPassword=
ExternalProxyUsername=
MailServerPassword=
Edition=PROFESSIONAL
Messages.DoesUserWantToViewTutorials=false
MailServerUsername=
IPAddressesToAllow=192.168,127.0,localhost
MaxConcurrentScrapingSessions=1
TidyHTML=true
MailServerHost=
ConnectionTimeout=20
InstallDirectory=C\:\\Program Files\\screen-scraper professional edition\\
Workbench.NumTimesRun=2
ExternalProxyHost=
SaveLargeFields=false
OutputLogFiles=false
DataExtractorTimeout=15
MaximumMemoryAllocation=256
ExternalNTProxyHost=
DatabasePort=9001
MainFrame.LastWidth=1032
ProxyPort=8777
ExternalNTProxyPassword=
MainFrame.LastHeight=746
ExternalNTProxyUsername=
DontLogBinaryFiles=true
DefaultFont=Courier New
ExternalNTProxyDomain=
Version=2.7.2.19a
DefaultProxySession=helloWorld
ExternalProxyAuthentication=foo\:bar
ExternalNTProxyAuthentication=NTfoo\:NTbar
SOAPPort=8779
ExternalProxyPort=
LastSelectedDirectory=C\:\\Documents and Settings\\t-dturkule.PMI\\My Documents
ServerPort=8778
floowing is error.log file after i start the scraping:
-------------------------------------------------------
Java version: 1.4.2_08
Java vendor: Sun Microsystems Inc.
Java class path: C:\Program Files\screen-scraper professional edition\screen-scraper.jar;C:\Program Files\screen-scraper professional edition\lax.jar;
Operating system name: Windows XP
Operating system architecture: x86
Operating system version: 5.1
SUN 1.42: SUN (DSA key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; X.509 certificates; JKS keystore; PKIX CertPathValidator; PKIX CertPathBuilder; LDAP, Collection CertStores)
SunJSSE 1.42: Sun JSSE provider(implements RSA Signatures, PKCS12, SunX509 key/trust factories, SSLv3, TLSv1)
SunRsaSign 1.42: SUN's provider for RSA signatures
SunJCE 1.42: SunJCE Provider (implements DES, Triple DES, AES, Blowfish, PBE, Diffie-Hellman, HMAC-MD5, HMAC-SHA1)
SunJGSS 1.0: Sun (Kerberos v5)
Set parameter http.useragent = Jakarta Commons-HttpClient/3.0
Set parameter http.protocol.version = HTTP/1.1
Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager
Set parameter http.protocol.cookie-policy = rfc2109
Set parameter http.protocol.element-charset = US-ASCII
Set parameter http.protocol.content-charset = ISO-8859-1
Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@18f9b75
Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z]
Set parameter http.protocol.unambiguous-statusline = true
Set parameter http.protocol.single-cookie-header = true
Set parameter http.protocol.strict-transfer-encoding = true
Set parameter http.protocol.reject-head-body = true
Set parameter http.protocol.warn-extra-input = true
Set parameter http.protocol.status-line-garbage-limit = 0
Set parameter http.protocol.reject-relative-redirect = true
Set parameter http.protocol.allow-circular-redirects = true
Set parameter http.connection.timeout = 20000
SimpleHttpConnectionManager being used incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.
Set parameter http.socket.timeout = 20000
enter GetMethod(String)
Set parameter http.protocol.cookie-policy = compatibility
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
HttpMethodBase.addRequestHeader(Header)
Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@1d03a4e
enter HttpClient.executeMethod(HttpMethod)
enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState)
SimpleHttpConnectionManager being used incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.
Attempt number 1 to process request
enter HttpConnection.open()
Open connection to www.hepsiburada.com:80
enter HttpMethodBase.execute(HttpState, HttpConnection)
enter HttpMethodBase.writeRequest(HttpState, HttpConnection)
enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection)
enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String)
>> "GET / HTTP/1.1[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection)
enter HttpMethodBase.addRequestHeaders(HttpState, HttpConnection)
enter HttpMethodBase.addUserAgentRequestHeaders(HttpState, HttpConnection)
enter HttpMethodBase.addHostRequestHeader(HttpState, HttpConnection)
Adding Host request header
enter HttpMethodBase.addCookieRequestHeader(HttpState, HttpConnection)
enter HttpState.getCookies()
enter CookieSpecBase.match(String, int, String, boolean, Cookie[])
enter HttpMethodBase.addProxyConnectionHeader(HttpState, HttpConnection)
>> "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Accept-Encoding: gzip,deflate[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Accept-Language: en-us,en;q=0.5[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "Host: www.hepsiburada.com[\r][\n]"
enter HttpConnection.print(String)
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
enter HttpConnection.writeLine()
enter HttpConnection.write(byte[])
enter HttpConnection.write(byte[], int, int)
>> "[\r][\n]"
enter HttpConnection.flushRequestOutputStream()
enter HttpMethodBase.readResponse(HttpState, HttpConnection)
enter HttpMethodBase.readStatusLine(HttpState, HttpConnection)
enter HttpConnection.readLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
<< "HTTP/1.1 302 Object moved[\r][\n]"
enter HttpMethodBase.readResponseHeaders(HttpState,HttpConnection)
enter HttpConnection.getResponseInputStream()
enter HeaderParser.parseHeaders(InputStream, String)
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
enter HttpParser.readLine(InputStream, String)
enter HttpParser.readRawLine()
<< "Date: Mon, 30 Oct 2006 20:36:39 GMT[\r][\n]"
<< "Server: Microsoft-IIS/6.0[\r][\n]"
<< "Srv: 1[\r][\n]"
<< "Location: default.aspx[\r][\n]"
<< "Content-Length: 133[\r][\n]"
<< "Content-Type: text/html[\r][\n]"
<< "Cache-control: private[\r][\n]"
enter HttpMethodBase.processResponseHeaders(HttpState, HttpConnection)
enter HttpMethodBase.readResponseBody(HttpState, HttpConnection)
enter HttpMethodBase.readResponseBody(HttpConnection)
enter HttpConnection.getResponseInputStream()
enter HttpMethodBase.canResponseHaveBody(int)
Redirect required
Redirect requested but followRedirects is disabled
Buffering response body
<< "
<< "
Object Moved
This object may be found here.[\n]"
Resorting to protocol version default close connection policy
Should NOT close connection, using HTTP/1.1
enter HttpConnection.isResponseAvailable()
enter HttpConnection.releaseConnection()
Releasing connection back to connection manager.
enter getContentCharSet( Header contentheader )
enter HeaderElement.parseElements(String)
enter HeaderElement.parseElements(char[])
enter HeaderElement.getParameterByName(String)
Default charset used: ISO-8859-1
The following error occurred: null
java.lang.NullPointerException
at com.screenscraper.scraper.ScrapeableFile.scrapeData(ScrapeableFile.java:3317)
at com.screenscraper.scraper.ScrapeableFile.scrape(ScrapeableFile.java:2126)
at com.screenscraper.scraper.ScrapingSession.scrapeFile(ScrapingSession.java:2215)
at com.screenscraper.scraper.Scraper.scrape(Scraper.java:200)
at com.screenscraper.scraper.Scraper.run(Scraper.java:110)
following is scraping session log;
--------------------------------
Starting scraper.
Running scraping session: test
Processing scripts before scraping session begins.
Scraping file: "test1"
test1: Preliminary URL: http://www.hepsiburada.com
test1: Using strict mode.
test1: Resolved URL: http://www.hepsiburada.com
test1: Sending request.
following is scraping file last request log;
--------------------------------
GET / HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Host: www.hepsiburada.com
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Encoding: gzip,deflate
Problem ? Redirect requested but followRedirects is disabled
Hi Dogan,
I just tried this with the very latest alpha version of screen-scraper and it seems to work fine. Would you mind upgrading and giving it a try? Here are instructions on doing that
http//blog.screen-scraper.com/2006/10/24/version-27219a-of-screen-scraper-available/
Kind regards,
Todd Wilson