Multi-part entity?
I've got a form I'm posting a few (ok, 45) parameters to, and everything looks like it should be cool. I'm looking at the post parameters in HTTPSPY and in the screenscraper request and their pretty much the same...
.. except screen scraper is saying it's a Multi-part entity. What might I be doing to cause this?
([yyy] are urls/paths removed from this example.)
ss:
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Cookie: $Version=0; ASP.NET_SessionId=rqly4q5555foeo45alc0n245; $Path=/
Content-Length: 1796
Content-Type: application/x-www-form-urlencoded
Host: [yyy]
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer:
[yyy]
__EVENTTARGET=Search&__EVENTARGUMENT=0&__VIEWSTATE=dDwtMzM2MjE5ODM5O3Q8O2w8aTwwPjs%2BO2w8dDw7bDx[i]truncated4neatness[/i]Mainmenu1%3AMT=&Mainmenu1%3AtimerData=-1&Mainmenu1%3AtimerUrl=0&f%3Ad1[i]truncated4neatness[/i]53C%2FDateChooser%253E&fxd1_input=01%2F01%2F1986&f_d1_DrpPnl_Calendar1=&f%3Ad2_hidden=%253CDateChooser%[i]truncated4neatness[/i]&fxd2_input=&f_d2_DrpPnl_Calendar1=&f%3At3=&f%3At4=&f%3At5=&f%3At1=&f%3At2=&f%3Ar1=R&f%3At21=&f%3At22=&f%3At23=&f%3At31=&f%3Ad31=&f%3At32=13056&f%3At33=PULLMAN&f%3AtxtSubdivision=&f%3AtxtSDBook=&f%3AtxtSDPage=&f%3At41=&f%3At42=&f%3At44=&f%3At43=&f%3AtxtCondo=&f%3AtxtCPlanNo=&f%3At52=&f%3At53=&f%3At56=&f%3At57=&f%3Ad61=&f%3Ad62=&f%3Ad63=&f%3Ad64=&f%3Ad7=&f%3At71=&Search__10=%3A0
Multi-part entity
httpspy:
POST [yyy] HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Referer: [yyy]
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Connection: Keep-Alive
Content-Length: 1953
Host: [yyy]
Pragma: no-cache
Cookie: submitTime=Wed%2C%2020%20Jun%202007%2016%3A12%3A22%20UTC; ASP.NET_SessionId=4vzy3n45ptxkwb55hdat5345
__EVENTTARGET=Search&__EVENTARGUMENT=0&__VIEWSTATE=dDwtMzM2MjE5ODM5O3Q8O2w8aTwwPjs%2BO2w8dDw7bDx[i]truncated4neatness[/i]&Mainmenu1%3AMT=&Mainmenu1%3AtimerData=-1&Mainmenu1%3AtimerUrl=0&f%3Ad1_hidden=%253CDateChooser%2520Value%253D%25221986%25252C1%25252C1%2522%253E%253CExpandEffects%253E%253C%2FExpandEffects%253E%253C%2FDateChooser%253E&fxd1_input=1%2F1%2F1986&f_d1_DrpPnl_Calendar1=&f%3Ad2_hidden=%253CDateChooser%2520Value%253D%2522%252520%2522%253E%253CExpandEffects%253E%253C%2FExpandEffects%253E%253C%2FDateChooser%253E&fxd2_input=+&f_d2_DrpPnl_Calendar1=&f%3At3=&f%3At4=&f%3At5=&f%3At1=&f%3At2=&f%3Ar1=&f%3At21=&f%3At22=&f%3At23=&f%3At31=&f%3Ad31=&f%3At32=13056&f%3At33=PULLMAN&f%3AtxtSubdivision=&f%3AtxtSDBook=&f%3AtxtSDPage=&f%3At41=&f%3At42=&f%3At44=&f%3At43=&f%3AtxtCondo=&f%3AtxtCPlanNo=&f%3At52=&f%3At53=&f%3At56=&f%3At57=&f%3Ad61=&f%3Ad62=&f%3Ad63=&f%3Ad64=&f%3Ad7=&f%3At71=&Search__10=%3A0
Multi-part entity?
fnirt,
My guess is that the issue is not with the multi-part content-type specification. Our on-going experience with ASP.Net Websites has been that it is particularly important to be careful to pass certain hidden fields from one page to another. They also may require the appropriate referer be explicitly set, especially when there are redirects occurring on the server (please see [b]setReferer[/b] below).
[i]httpspy[/i] will come in handy for both these needs. Another good one is the [url=https://addons.mozilla.org/en-US/firefox/addon/3829]"Live HTTP Headers"[/url] add-on for Firefox.
Based on our experience the page in your example would likely need to have the page that preceded it included as a scrapeable file even if you did not intend to scrape content from that page. The reason is that the page in your example contains the post parameter VIEWSTATE.
For the VIEWSTATE post parameter you will need to scrape the previous page's content and extract out the value of "VIEWSTATE" as it is set as a hidden post parameter on that page and use the extracted value as a post parameter for your example page.
You will need to make use of the [url=http://www.screen-scraper.com/support/docs/api_documentation.php#addHTTPParameter]addHTTPParameter method[/url].
Along with VIEWSTATE you may find these other variables that need to be treated similarly: VIEWSTATEENCRYPTED, EVENTTARGET, EVENTARGUMENT, EVENTVALIDATION, PREVIOUSPAGE, LASTFOCUS.
Please verify that these values are being passed appropriately and reply back with any additional questions.
Thank you,
Scott
-------------
[b]Setting Referers[/b]
In this one case it's possible that you do not need to explicitly set the referer; however, if you find that you do for other situations you can make use of an as-of-yet undocumented method in the API:
scrapeableFile.setReferer( String url )
Upon implementation you should cast the URL appropriately. For example:
URL url = new URL( "http://www.screen-scraper/" );
scrapeableFile.setReferer( url );
Use this method in a script that is run before the scrapeable file is invoked.