Require help/suggestions in scraping following sites
Hi,
Is it possible to scrape the following sites.
www.bananarepublic.com
www.gap.com
www.oldnavy.com
Here i need to fetch the store infomation.
The zip parameters required i would be reading and supplying it from a file.
The location which generates the store infomation is
http//206.231.92.100/StoreLocator/BRPrxResults.aspx?&GAD2=&GAD3=30301+(postal+code)%2c+Georgia%2c+United+States&GCITY=&GSTATE=&GZIP=30301&GAD4=USA&DSN=MapPoint.NA&LOC=33.6488279083249%3a-84.391583912819&IC=33.6488279083249%3a-84.391583912819%3a32%3a30301+(postal+code)%2c+Georgia%2c+United+States&NR=12&DBR=150&FC=B&FCT=Or&gapcountry=US
Now my problem here is the URL contains location parameters and these parameters are generated on the server side.
Is it possible by any way that i could fetch the location details so that i can scrape the required store information.
Please let me know your suggestions..
Thanks,
Balaji
Require help/suggestions in scraping following sites
It sounds like you are trying to scrape http://www.bananarepublic.com/customerService/storeLocator.do for each zipcode in your file and that the problem is that you can't seem to create a scrapeable file that then scrapes all the detail information (under the "directions" links) for each store in the zipcode. The reason for this, I understand, is that the URLs for those links change based on some program internal to the server.
I looked at the URL under the "directions" link as represented in the raw HTML of the site and compared it with the link that my browser shows in the URL field and found that the raw HTML represented &'s with "&".
I found that if you scrape the URL under the "directions" link and then run it through a script that replaces all the "&" with "&" and then put it into the URL of the scrapeable file that tries to scrape the page resulting from the "directions" link, then it works.
I included the test scraping session that I made to scrape the stores in the 95132 zipcode. It's in raw XML form, so all you need to do it cut and paste the XML below into an editor, rename it to "br.xml", and import it into your screen-scraper workbench.
-Alan
Require help/suggestions in scraping following sites
Alan,
Following are the sequnce of steps
Open the site www.bananarepublic.com
Click on the store locator link in the bottom of the we b page
this leads to the site
http//www.bananarepublic.com/customerService/storeLocator.do
Now in that page give zip code as input and submit the form
For the provided zip code list of stores information would be listed by the site
I would be needing the information of all these stores
On submitting the form the stores information are loaded within the frame by the following URL
http//206.231.92.100/StoreLocator/BRPrxResults.aspx?&GAD2=&GAD3=30301+(postal+code)%2c+Georgia%2c+United+States&GCITY=&GSTATE=&GZIP=30301&GAD4=USA&DSN=MapPoint.NA&LOC=33.6488279083249%3a-84.391583912819&IC=33.6488279083249%3a-84.391583912819%3a32%3a30301+(postal+code)%2c+Georgia%2c+United+States&NR=12&DBR=150&FC=B&FCT=Or&gapcountry=US
I would be giving zip code as parameter from my input file.
My input file has the master list of all zip codes available in US.
These values i would be reading from file and pass to the long URL for substitution.
Now due to the location code i am not able to scrape automatically using the program as these are generated from the server side.
Please let me know if you need any other infomation
Require help/suggestions in scraping following sites
Could you list the URLs in order that you're trying to scrape as well as any special parameters that each page needs? Also, it would help me to know the exact page you're trying to get data from and the data you're trying to get from that page.
Thanks,
Alan
Require help/suggestions in scraping following sites
hi Alan i have the list of all possible zip codes possible as a file with me.
I would be able to set them as parameter from the script.
But my problem is some of the location codes are generated on the server side and i would not be able to specify these parameter as input as these wont be available.
These are mandatory parmeters to access the site
Please let me know if i am not clear
balaji
Require help/suggestions in scraping following sites
Hi,
If you are trying to scrape http://www.bananarepublic.com/customerService/storeLocator.do using a list of zipcodes that you have, I would suggest the following:
1) Create a scrapeable file from the file which lists all the zipcodes and put an extractor pattern in it to find all the zipcodes in the file.
2) Create a scrapeable file for http://www.bananarepublic.com/customerService/storeLocator.do and set it to be scraped every time the first scrapeable file finds a zipcode.
3) Input the zipcode found from the file into the parameters of the 2nd scrapeable file.
4) Set extractor patterns in the 2nd scrapeable file to get all the store information you need.
Hope that helps.
-Alan
Require help/suggestions in scraping following sites
Hi Alan,
You can access the following URL
http//www.bananarepublic.com/customerService/storeLocator.do
thanks,
Balaji
Require help/suggestions in scraping following sites
Can you give me the URL to the site you want to scrape? When I put the 206.231.92.100/ URL in my browser I get an error message, so could you tell me the exact URL of where you're trying to extract data?
Thanks,
Alan
Require help/suggestions in scraping following sites
"Hi Balaji,
I don't think I understand; could you be more specific? Do you need to scrape the first 3 sites that you mentioned or the last one with the long URL or both? I don't know if this is right, but it seems like you need to scrape the last one in order to scrape the first 3. Could you give more detail about how those sites are related and what you need to do?
-Alan"
Hi Alan,
I want to scrape the last one with the long URL.
I need to scrape the store information.
I would be giving zip code as parameter from my input file.
My input file has the master list of all zip codes available in US.
These values i would be reading from file and pass to the long URL for substitution.
Now due to the location code i am not able to scrape automatically using the program.
Please let me know if you need any other infomation
Require help/suggestions in scraping following sites
Hi,
I want to extract the URL's (of the individual details pages) of all the companies listed when i search for a particular company from here. That is, for example, I want to search for 'european investment corporation'. Now there are 7 companies listed, and each one is a link to a page that gives details about that particular company. What I want is the URL of that details page. Now the way the URL is constructed makes it impossible for me to get the URL. First off, there is cookie setting to do, but this can be done. Then, there are a bunch of Javascript functions which has got a lot to do with Random number generation. These make further calls to other functions which obfuscate the parameters further. Is there anyway I can work around these?
Thanks and regards,
hemanth
Require help/suggestions in scraping following sites
Hi Balaji,
I don't think I understand; could you be more specific? Do you need to scrape the first 3 sites that you mentioned or the last one with the long URL or both? I don't know if this is right, but it seems like you need to scrape the last one in order to scrape the first 3. Could you give more detail about how those sites are related and what you need to do?
-Alan