screen-scraper public support
How can I match a token conditionally, or pre parse the page to remove certain noise words so that my extractor pattern matches
This is my extractor pattern:-
<td class="prd-img"><a href="~@COMPAREPRODUCTURL@~"><img src="~@PRODUCTIMG@~" alt="~@IGNORE2@~" height="~@HEIGHT@~" width="~@WIDTH@~" /></a> </td>
<td class="prd-details">
<p class="prd-name"><strong><a href="~@PRODUCTURL@~">LG ~@MODELNO@~ ~@IGNORE2@~</a></strong></p>
<p class="prd-description">~@PRODUCTTITLE@~</p>
<p class="prd-services"><img src="~@RESERVEANDCOLLECTIMG@~" alt="~@RESERVEANDCOLLECTALT@~" /></p>
5.0 Installation problem
When downloading version 5.0 of the basic edition and attempting to restart screenscraper, a message box appears entitled: Startup Error
It contains the text:
java.lang.NoClassDefFoundError: com/screenscraper/util/DataMain
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java:544)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
Simple variable question
In a screen scraping session I capture a URL using ~@SITE_URL@~ in an extractor pattern.
I then invoke a script from that extractor screen that executes "after each pattern match". Within the script when I reference SITE_URL using getVariable it shows SITE_URL as being null. I would like to use the contents of SITE_URL as
~#SITE_URL#~ for my next scrapable file URL, but can't because it is null even though the original scraping file filled it in.
thanks for the help
choosing a category in a form
Hello,
I would like to post in a form containing a select field where i have to choose from different categories. The choice of the category will be different everytime and depends on the subject i want to post to (it is a directory). I want to know if it is possible to make a keyword based search before chosing a category and posting the form.
If you have a hint or any code that could help it would be awsome !
Thanks in advance
hanlin
How to pass multiple command-line parameters
The tutorial on command-line (http://community.screen-scraper.com/Tutorial_3_Page_3_Using_the_Command_Line) says that to pass a parameter via command-line, you use the form:
jre/bin/java -jar screen-scraper.jar -s "Hello World" --params "TEXT_TO_SUBMIT=Hello+World"
My question is -- how you do specify multiple parameters? What do you use as delimiters? Semi-colons? Or do you make multiple '--params'?
Is the ff. correct:
jre/bin/java -jar screen-scraper.jar -s "Hello World" --params "TEXT_TO_SUBMIT=Hello+World;TEXT2=Hi+World"
-or-
extractor Pattern doesn't work
I’m desesperate ! I try to scrape a simple page with a Extractor Pattern and sub extractor Pattern but nothing works !
Here is the link : http://www.lesinrocks.com/musique/concerts/detail-concert/concert/festival-all-stars/
What i want, is the description part :
a script to auto-increment int value after every scrape?
Hi,
This might be more of a Java question than screen-scraping. I was wondering is it possible to have a script in Java that can provide an auto-increment value to each row of scraped data? I am scraping product information and just need a simple product#1 has a "1", product#2 has a "2" as a product_id.
Thank you very much for any suggestions in advance!
Passing "%" in parameter
I want to make a call to a url with parameters containing the percent symbol (%) like so:
http://www.someurl.com?id=%99%9D%9B%9C%98
If I try putting "%99%9D%9A%9A%9B" as-is in the Parameters tab, the '%' gets expanded so that the actual URL being called is:
http://www.someurl.com?id=%2599%259D%259B%259C%2598
If I try using something like java.net.URLDecoder.decode(id, "UTF-8") on the parameter prior to passing it to scraper, the actual URL changes to:
http://www.someurl.com?id=%EF%BF%BD%EF%BF%BD
What is the correct way of doing this?
Extractor Pattern - Script id
I would like to extract specific values that is part of a script on a Web page.
A short version of the page is shown below:
< script id="script_1" type="text/javascript"><br />
< !--<br />
//configuration<br />
var fallback = new Object();<br />
var parameters = 'address=Main+56;zipcode=1234';<br />
< /script><br />
[...]
(had to insert a space before the word script, to make the code visible on this forum)
These values is only stored in the script, and not visible as HTML on the page.
Is HTML Tidy Permanently Turned on in Basic Edition 5.0?
Is Tidy HTML permanently turned on in Screen Scraper basic edition 5.0?
I have just tried making a new scraping session, and even though I have disabled Tidy HTML under Options>Settings, the following line appear when I look at the last response from the server: