screen-scraper support for licensed users
Problems with Website using Javascript
Hi,
I have been sucessfully scraping a website for some time now, but recently I get a message of the "Javascript has to turned on..." type. Now, I know that SS does not run JS and therefore I have started to analyse what is going on "behind the curtains". There are several JSs that are sent as responses and calculate cookies and so on. I am now busy trying to emulate all this, using JS SS-scripts. So far w/o success.
504 Error only when scraping session run from cron
Hi.
A scraping session that I it´s ready for production use, works seamlessly in my local mac. When I exported it and try to run it in a linux EC2 this very strange thing happens:
If I remotely access the linux server with ssh from a terminal window and run it directly with the following command, the scrape runs as it supposed to, with no errors:
myscrapeshellscript.sh just contains:
jre/bin/java -jar screen-scraper.jar -s "My Scraping Session"
Tor & Polipo processes left running on memory
Hello.
I am using Tor and Polipo from screen scraper(with the java library that you guys kindly provided in another post) succesfully.
scrapable files with same URL present different patterns in screen-scraper
We have run into an example where we created the extractor patterns for a scrapable file, but they patterns do not match when the same URL is called programmatically while running the scraping engine.
We can even cut and paste the URL (the page URL that is generated programmatically) from the logs into the first scrapable file and see that the original extractor patterns still work. However they don't work in the file scraped during the scraping session.
forum structure traversing question
We need to scrape a forum that has a forum list where each forum (in the list of all forums) has the following structure: www.siteaddress/forum_identifier.html. However, each page of threads in the individual forum (after the first page) has the structure www.siteaddress/forum_identifier_site_identifer_index_number.html.
check for string value of session variable
Hi.
Is there a problem with the if syntax below?
(having previously set the value of the STATUS variable to either "ON" or "OFF")
.. do this;
.. and do that;
}
problem with unwanted tags inside text extracted
Hi. I am scraping the posts in a discussion forum. Not the content, but the posts titles, date, user, etc... My extractor pattern looks roughly like this:
</td><td>
<a id="~@DUMMY@~" href="~@DUMMY@~">~@POSTTITLE@~</a>
</td><td style="white-space:nowrap;">
<a id="~@DUMMY@~" href="/boards/profilea.aspx?user=~@USERID@~">~@DUMMY@~</a>
</td><td style="white-space:nowrap;">~@DATEOFPOST@~</td>
</tr>
Read from file (large amount of data)
Hello,
Please help me to understand how should I set scraper.
I have a .txt file with around 500000 urls I need to scrape.
I think about the following - I make one scrape and it's job is to read in the .txt input and loop, and then launch a separate scrape that will go get the data for each line. All this using RunnableScrapingSession
Is it good solution for such big amount? or you can suggest me something better
Really appreciate your help.
navigating a multipage forum
I am attempting to scrape a forum (healingwell.com) that has multiple pages in some subforums - but no obvious cues that there are additional pages. In other words no "next" or "previous" patterns to use as cues to augment the page counter in the URL.
The best way to augment he page counter that I have come up with is the following:
Since each page in the subforum contains links to the next few pages and links to the last pages in the form of page numbers (i.e.
screen scraper permissions issue in Linux
Hello,
I recently installed screen scraper pro in an Ubuntu Amazon EC2 instance. I had managed to run it succesfully a few times, but now when I connect to the desktop with Microsoft Remote Desktop and try to run the workbench I get a messagebox saying:
In order for screen-scraper to function properly , please ensure you have write access to the folder in which screen-scraper is installed, as well as all of its sub-folders.
For example, I found that you don´t have write access to the following file(s):