screen-scraper support for licensed users

Questions and answers regarding the use of screen-scraper. Only licensed Professional and Enterprise Edition users can post; anyone can read. Licensed users please contact support with your registered email address for access. This forum is monitored closely by screen-scraper staff. Posts are generally responded to in one business day.

Problems with parsing a data - thought this was simple, but somewhere it's causing grief

At the top of the file I have

import com.screenscraper.common.*;
import java.util.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.lang.*;
import java.util.*;
import java.io.*;

Then I am trying to do

SimpleDateFormat formatter = new SimpleDateFormat( "MMM d, yyyy");
Date DRecord = new Date((Date)formatter.parse( myDataRecord.get("ORDERCHANGEDATE")));

And I am getting

How many scrapeable files do you have to make?

I am attempting to scrape a Fannie Mae website. They have a login page, and you must ne logged-in before you can access other pages. The scenario is that you go to the Home page, the Login page, then step through a few other pages until you come to the page that you are interested in downloading. I set up scrapeable files for the Home and Login pages, and another one for the fina page that I want to download. When I run the scraping session, it appears to be going to the correct pages -- Home page, then using the correct login words to login.

Session Variables (Key and Value) in the Web Interface

I'm having trouble getting the session varables to work in the Web Interface. I'm using the following code to initialize the scraping session:

Url is not captured

Hi,

While scraping a site, my requirement is to scrap the urls for all the records displayed after performing a search.
This means that when I click on the record the details for that particular record should be displayed.

The scenarios where I cannot capture any url is on a javascript function submit like onclick="this.form.submit();"

and a button submit where name looks like name="MatchingApplications:DataGrid_ResultSet:_ctl11:_ctl0"

Is it possible to capture urls for the similar conditions?

Thanks
Yogesh

How does screen-scraper simulate moving from one page to another?

Help me understand this: You create a proxy session and then it captures the movements from one page to another as you manually type text into text boxes and click on buttons. That is captured and saved into the proxy session. Then you create a scraping session and scrapeable files.

Alpha bug?

Morning Scrapers,

I installed the new alpha release today and all my scrapes refused to run. Instead of a bunch of HTML i got [Binary data} as the only SS reply besides the HTML headers.

I uninstalled the version and went back to the last stable but that feels like stoneage in comparison....

Are you familiar with this?

\ appended to the generated fileName

Hi,

I am facing one strange issue.
I imported my .sss files on my linux server.Everything worked fine and my files were generated as follows 25012010-CouncilName-25-01-2010.txt.

But when I run the scraping sessions again without changing any settings after 2 days on the linux server,I noticed a strange thing.
The files were generated with a \ appended to the above generated filename.

i.e \25012010-CouncilName-25-01-2010.txt.

Any solutions for the same?

Thanks in advance.

How to test if a URL exists from a .NET program?

How can you tell if a URL exists from a .NET program? I tried using the WebBrowser control with the Navigate method. It works OK except when the file is a PDF. In that case, it invokes Adobe Reader to pop up the PDF. I don't actually want to bring up a PDF, I just want to know if it exists. It won't work too well to keep popping up PDF files. Do you just remove the file type association in the "programs" option in Internet Options / Programs? Have you guys done this with straight .NET? If not, how would you do it with screen-scrapes and and a .NET program?

How to create extractor pattern that uses parts of the text it wishes to extract?

I would like to extract a date and time from within a bunch of HTML. A lot of the HTML looks very similar, and it might change occasionally change. This is the format of what I would like to extract: 1:15PM 1/21/10 I am searching for this field because it changes periodically. Is it possible to set up the etractor pattern to look for the colon, AM or PM, and the 2 slashes? The slashes will shift depending on if the month or day has 1 or 2 digits. And it will either show "AM" or "PM" depending on the time of day.

How to test extractor pattern for web pages too big for scrapeable file to display?

(This post appears normal in edit mode, but not is display mode. It contains HTML.)

I can't get screen-scraper to return a DataSet that contains data I am trying to extract. First of all, the scraping session won't display the data I need. I allow it display an unlimited number of lines by blanking out the max line count. But it says: "The response exceeded the maximum length and was truncated. If you'd like to view the full response, click the "Display Response in Browser" button, then view the source in your web browser." at the bottom of the screen.