Urgent Help needed
Hello Everybody,
I am trying to scrap data from shopzilla
URL :-
http://www.shopzilla.com/gen385822/search?absoluteMinPrice=21&absoluteMaxPrice=45&minPrice=21&maxPrice=45&priceRangeSubmit=&sort=priceAscending&show=20&zipcode=zip+code
I am trying to scrap price with merchant name..
And I am getting Warning! Received a status code of: 405.
I dont know how I would fix this....
Any any help would be aprreciated...
Guys its really urgent
Thanks
Mankss
I couldn't get an error. Here
I couldn't get an error. Here is the scrape I made for it. You need to paste this code into a text editor, save it as "shopzilla.sss", and import it to screen-scraper.
<scraping-session use-strict-mode="true"><script-instances><owner-type>ScrapingSession</owner-type><owner-name>Shopzilla</owner-name></script-instances><name>Shopzilla</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>July 28, 2011 09:05:16</date_exported><character_set>ISO-8859-1</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>http://www.shopzilla.com/gen385822/search</URL><last-request></last-request><name>File from Shopzilla</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text><li class="product-info-popup hide"
~@DATARECORD@~
</li></pattern-text><identifier>Products</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><identifier>DATARECORD</identifier></extractor-pattern-tokens><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>"price_link">$~@PRICE@~<</pattern-text><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="true" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[\d\.(<span>)]*</regular-expression><identifier>PRICE</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>info-description">~@DESCR@~<</pattern-text><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^<>]*</regular-expression><identifier>DESCR</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><script-instances><script-instances when-to-run="80" sequence="1" enabled="true"><script><script-text>line = "";
while (line.length()<90)
line += "=";
session.log(line);
session.log("Found product:");
session.log(dataRecord.get("DESCR"));
session.log(dataRecord.get("PRICE"));
session.log(line);</script-text><name>Shopzilla log</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Products</owner-name></script-instances></extractor-patterns><HTTPParameters sequence="6"><key>sort</key><type>GET</type><value>priceAscending</value></HTTPParameters><HTTPParameters sequence="8"><key>zipcode</key><type>GET</type><value>zip code</value></HTTPParameters><HTTPParameters sequence="2"><key>absoluteMaxPrice</key><type>GET</type><value>45</value></HTTPParameters><HTTPParameters sequence="7"><key>show</key><type>GET</type><value>20</value></HTTPParameters><HTTPParameters sequence="1"><key>absoluteMinPrice</key><type>GET</type><value>21</value></HTTPParameters><HTTPParameters sequence="3"><key>minPrice</key><type>GET</type><value>21</value></HTTPParameters><HTTPParameters sequence="4"><key>maxPrice</key><type>GET</type><value>45</value></HTTPParameters><HTTPParameters sequence="5"><key>priceRangeSubmit</key><type>GET</type><value></value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>File from Shopzilla</owner-name></script-instances></scrapeable-files></scraping-session>
Can I send you my .sss file
Thanks for your reply Jason,
Well here is the thing, I have around 200 SKU's which are sitting in a TXT file. So, I want to write some kinda script by which it will go to that txt file, take the sku and scrap the data and save it in a spreadsheet.
Let me know if its possible.
If you need I'll send you my .sss file.
And again any help would be appreciated.
Regards
Mankss
My copy of .sss file
<scraping-session use-strict-mode="true"><script-instances><owner-type>ScrapingSession</owner-type><owner-name>google</owner-name></script-instances><name>google</name><notes></notes><cookieHandling>1</cookieHandling><cookiePolicy>0</cookiePolicy><hTTPClient>0</hTTPClient><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>1</originator_edition><logging_level>1</logging_level><date_exported>July 28, 2011 15:29:56</date_exported><character_set>UTF-8</character_set><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><last-request></last-request><name>START SCRIPT</name><script-instances><script-instances when-to-run="40" sequence="1" enabled="true"><script><script-text>// Create a file object that will point to the file containing
// the search terms.
File inputFile = new File( "google.txt" );
// These two objects are needed to read the file.
FileReader in = new FileReader( inputFile );
BufferedReader buffRead1 = new BufferedReader( in );
// Read the file in line-by-line. Each line in the text file
// will contain a search term.
while( ( searchTerm = buffRead1.readLine() )!=null)
{
// Set a session variable corresponding to the search term.
session.setVariable( "SEARCH", searchTerm );
// Get search results for this particular search term.
session.scrapeFile( "GoogleScrape" );
}
// Close up the objects to indicate we're done reading the file.
in.close();
buffRead.close();
</script-text><name>google</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapeableFile</owner-type><owner-name>START SCRIPT</owner-name></script-instances></scrapeable-files><scrapeable-files sequence="2" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>~#SEARCH#~</URL><BASICAuthenticationUsername></BASICAuthenticationUsername><last-request></last-request><name>GoogleScrape</name><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text><div class="grid_view_box boxSprite-~@DATARECORD@~</a></div>
</div></pattern-text><identifier>Untitled Extractor Pattern</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="1"><identifier>DATARECORD</identifier></extractor-pattern-tokens><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text><div class="price"><a href="~@URL@~">~@PRICE@~</a></div></pattern-text><extractor-pattern-tokens optional="true" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="1"><identifier>URL</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="true" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="2"><identifier>PRICE</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>class="merchantInfoTitle" target="_blank" onclick="">~@MERCHANT@~</a><a</pattern-text><extractor-pattern-tokens optional="true" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="1"><identifier>MERCHANT</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><extractor-patterns sequence="3" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text><div class="single_merchant_offer">Sold by<a class="~@URL1@~">~@PRICE2@~</a></pattern-text><extractor-pattern-tokens optional="true" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="1"><identifier>URL1</identifier></extractor-pattern-tokens><extractor-pattern-tokens optional="true" save-in-session-variable="false" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" null-session-variable="false" sequence="2"><identifier>PRICE2</identifier></extractor-pattern-tokens><script-instances/></extractor-patterns><script-instances><script-instances when-to-run="80" sequence="1" enabled="true"><script><script-text>FileWriter out = null;
try
{
session.log( "Writing data to a file." );
// Open up the file to be appended to.
out = new FileWriter( "Google-RESULTS.txt", true );
// Write out the data to the file.
out.write( scrapeableFile.getCurrentURL() + "~" );
out.write( dataRecord.get( "MERCHANT" ) + "~" );
out.write( dataRecord.get( "PRICE" ) + "~" );
out.write( dataRecord.get( "PRICE2" ) + "~" );
out.write( "\n" );
// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}</script-text><name>Write Google Products to a file1</name><language>Interpreted Java</language></script></script-instances><owner-type>ExtractorPattern</owner-type><owner-name>Untitled Extractor Pattern</owner-name></script-instances></extractor-patterns><HTTPParameters sequence="1"><key>URL</key><type>POST</type><value>~#SEARCH#~</value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>GoogleScrape</owner-name></script-instances></scrapeable-files></scraping-session>