Problem with URL being changed (Variables are removed)
I'm having some problems with scraping Yahoo Hot Jobs. The URL isn't following the standard format when you search (where the search variables follow a '?').
For example, my search result is yielding this as the URL:
http://hotjobs.yahoo.com/job-search-l-Pomona-CA-k-pomona%20valley%20hospital%20medical%20center-c-Healthcare-m-0-d-FT-d-PT-j-PERM-j-CONT-n-Pomona%20Valley%20Hospital%20Medical%20Center-h-pomona%20valley%20hospital%20medical%20center;_ylt=AjnfZ0O0ZNJXhTMeW8c.NTb6Q6IX
And the job details page is this:
http://hotjobs.yahoo.com/job-JM8FUTHIC5D;_ylt=Ah5q_lGVVFYS7OctSo19wuH6Q6IX?search_url=%2Fjob-search-l-Pomona-CA-k-pomona%2520valley%2520hospital%2520medical%2520center-c-Healthcare-m-0-d-FT-d-PT-j-PERM-j-CONT-n-Pomona%2520Valley%2520Hospital%2520Medical%2520Center-h-pomona%20valley%20hospital%20medical%20center
When I tried to Generate a scrapeable file, the program just sits there and doesn't generate it. Does this have to do with the URL?
Problem with URL being changed (Variables are removed)
Hi,
I just tried this on my Mac and it seemed to work fine. I'm running 10.5.3 on Intel. I can't think why the hardware profile would make a difference, though. The data from the HTTP requests is not very large.
I suppose the JVM version could make a difference. My version is 1.5.0_13 (you can determine the version by opening Terminal and typing "java -version"). Would you mind checking your version? Also, have you had this problem with any other web sites, or just this one?
In this particular case, bear in mind that these are just GET requests, so as a workaround you can simply create a scrapeable file from the scraping session screen (instead of from the proxy), then paste the URL into scrapeable file's URL field.
Todd
Problem with URL being changed (Variables are removed)
The owner uses a mac and tomorrow he'll test it to see if he can reproduce this problem. I'm using ubuntu 8.05 and I can't reproduce it.
Problem with URL being changed (Variables are removed)
Mac OS 10.4.11
1.8GHz PPC G5
2GB RAM
I don't know how much memory is devoted to the program, I don't really do much else when using SS.
Problem with URL being changed (Variables are removed)
In answer to your question--yes you should be able scrape it if the variables are in your url. You could try
http://hotjobs.yahoo.com/job-search-l-~#CITY#~-~#STATE#~-k-~#KEYWORDS#~-c-~#CATAGORY#~-m-0-d-FT-d-PT-j-PERM-j-CONT-n-~#COMPANY#~-h-~#KEYWORDS#~;_ylt=AjnfZ0O0ZNJXhTMeW8c.NTb6Q6IX
With the variables url encoded.
Also you could try upgrading to the alpha version, Here is the faq:
http://screen-scraper.com/support/faq/faq.php#Upgrade2Alpha
Also assuming is might be a problem with your machine what are a few stats about your machine such as:
OS running
total RAM
Memory devoted to screen-scraper
Problem with URL being changed (Variables are removed)
both links...I'm using ss4.0
Regardless of that fact (assuming it's just a problem with me or my computer), will I be able to scrape that site if all the variables are written into the URL?
Problem with URL being changed (Variables are removed)
To clarify,
Are you have having trouble generating a scrapeable file on the first URL or the second or both?
And what version of screen-scraper are you running?