screen-scraper support for licensed users
Help with scraping website
Hello,
I'm trying to scrape website, but main structure is a bit complicated for me. Please help me to understand how should I set scraper.
The structure is:
-Search page (with pagination)
|-Product page (i need to scrape details)
|-Review tab on Product page i.e http://prod_link/reviews (also with pagination)
Main structure is pretty much the same like on YP website.
Please give me a tip how can i scrape this. Should i create 1 scrapping session and work with scrapeable files or should i create 2 scrapping session (1 for Search page and 1 for Review tab)
Multi-threading and performance
Hello guys,
please help me to understand some basics.
1) When I'm going to start concurrent running scrapping sessions (server mode, from php), let's say 5 sessions, should i create 5 scrapping sessions with different names and own scripts, like Shopping site1, Shopping site2, etc. , or can i run 1 session 5 times but with different init parameters. SS running in server mode, initialize parameters passing from PHP.
2) I'm trying to scrape huge number of data. Content ordered like this:
Search Page:
- page title
- page title
- page title
Detail Page:
- desc
- website
Upgrade to Default User Agent
Any plans to upgrade the default User Agent used in screen-scraper? Currently it's IE6, and we've been running into a few sites that don't support that browser anymore, so we have to set a custom user agent.
Or perhaps if we could set our own default... no big deal, just a suggestion.
Client SSL Certificates
Does screen-scraper support client SSL certificates?
Scraping ASP.NET sites and Going to Next Page
Hi,
I have finished completing all the tutorials and was very keen to run my first scraping project. However I have hit a brick wall as the site (http://www.totaljobs.co.uk ) I am trying to scrape is an ASP.NET site and next button for next page uses aspx postback / javascript
Can someone please explain the steps I need to perform to go to the next page. I have read http://blog.screen-scraper.com/2008/06/04/scraping-aspnet-sites/ and other topics in the forum but I am still struggling.
how to use extractor/sub-extractor pattern data table
I am scraping a web page that contains both a known session variable and an unknown string that I need to assign to a session variable that can be used as a parameter in a later url request.
I am using DATARECORD as my extractor pattern and APP_ID_DESCR_PRS and DISPLAY_ID as my sub-extractor patterns.
Shift-JIS issue
Hi guys,
i was trying to pass these characters in the session: 東京都
but it not going to happen, my character set is to Shift-JIS, but that doesnt work.
Its happening on windows 7 laptop.
Cheers,
Radek
Getting to page 2
Somehow i am not able to go to the second page of the following url: http://www.zo$$over.nl/servie/servie/belgrado (replace $$ with nothing, click 'volgende' at the bottom of the page).
On clicking 'volgende' a javascript function is executed that connects to http://www.zo$$over.nl/Services/Endeca/EndecaQueryService.asmx/Query (replace $$ with nothing) with the following post value:
Edit Token issue
Hi Guys,
after last patch, i think i still have problem with tokens, when i have too many variables in my pattern i cant edit variables anymore, here is the sample where i have this issue:
<td~@GARBAGE_TD@~> </td>
<td~@GARBAGE_TD@~><b>Resign Date</b></td>
<td~@GARBAGE_TD@~>~@MEMBERS_RESIGN_DATE@~</td>
</tr>
</table>
<table width="100%" border="1" cellspacing="0" cellpadding="0">
<tr>
<td~@GARBAGE_TD@~><b>Name</b></td>
<td~@GARBAGE_TD@~>~@MEMBERS_NAME@~</td>
<td~@GARBAGE_TD@~>~@MEMBERS_MIDDLE_NAME@~</td>
<td~@GARBAGE_TD@~>~@MEMBERS_SURNAME@~</td>
new patch issue
hi guy,
not sure whats happen, but every time after i changed something in my token, my variable disappear, and i cant edit anything
can u please fix this in asap
Cheers,
Radek