screen-scraper public support
reformatDate in session variable?
I've got a bunch of dates being scrapped which are targeted for a MySQL table. However, they need to be reformatted before insert.
Since I'm a noob O-O/Java type, I'm probably missing something but I don't understand why this doesn't work:
session.reformatDate(session.getVariable("FILINGDATE1"), "MM/dd/yyyy", "yyyy-MM-dd");
Please post or e-mail me if you can suggest syntax that works!
TIA.
Dave Nuttall
San Antonio, TX
Can't Edit Token
Hi, I've got several extractor patterns for a session I'm working on. For most of the tokens, I had no problem double clicking them and adjusting the settings (e.g. Save session variable).
But for a handful, I double click and nothing happens. It doesn't seem to recognize the code as a token. When I click on "Apply Pattern to Last Data" it pulls the right data. Any thoughts?
Here are the tokens that screen-scraper doesn't recognize:
$50 to the first person able to do this with Screen-Scraper
Hi all!
I'm new to the community and I'm italian, so sorry for my bad english...
I'm searching for the best web data scraper program and I found this one, that seems very powerful.
As test I would see if he can do this:
I would it to crawl in all the categories and subcategories of the ittle e-commerce site www.mikescomputershop.com, grabbing to a text file all the products and prices along with their categories and subcategories.
Message not posted?
I posted a message in this forum yesterday but I still don't see it. Does this forum lose messages?
Is Javascript encrypted pages a problem?
Hi,
The site I'm scraping from: http://portal.uspto.gov/external/portal/pair
has Javascript. I'm really not sure how to go about scraping the data off of transaction histories since the url does not have a pattern.
These searches give me:
06/599,702
06/565,333
How do I scrape tables? I really need help. This is due soon
How do I scrape tables off of websites? Also, I have to get the table from the Transaction History of this website. http://portal.uspto.gov/external/portal/pair
However, there doesn't seem to be a direct url to the transaction history tab. How does the url change? Please look into this. Thanks so much!
Here are some sample application numbers that you can search:
06/622,222
06/519,444
06/472,481
06/645,533
What is a RunnableScrapingSession?
I know how to implment one, etc. but I'm confused about it's intent and behavior.
Is it just another scrape that kicks off via script but lives independantly, or does the session from the "calling" scrape inherit the session (and it's variables) from the RunnableScrapingSession?
Or is it meant just as a "kick this off and let it do it's own thing" (like writing to a database..)?
Sub-Extractor Pattern bug
The order of the sub-extractors changes when saving, exiting, and reloading screen-scraper. Reproduced several times.
dataSet's. How do they work? :)
The first snippet is from the API docs and the second is one I'm trying to use.
// Loop through each of the data records.
for( i = 0; i < dataSet.getNumDataRecords(); i++ )
{
// Store the current data record in the variable myDataRecord.
myDataRecord = dataSet.getDataRecord( i );
// Output the "PRODUCT_NAME" value from the data record to the log.
session.log( "Product name: " + myDataRecord.get( "PRODUCT_NAME" ) );
vs
File corruption bug
I've not had the time to reproduce or try to reproduce this yet but:
1) Scrape a page, save session variables
2) Write session variables to a file
3) session.pause( 5000 );
4) Stop the scrape before the 5 second pause is up
5) The session does not stop immediately and may continue onto the next records if in a loop.
Re-run the scrape session without restarting SS yields corrupt data.