Can't get 3.0 scrape to work in 4.0

Hi,

I'm trying to scrape a file that looks essentially like this:

...
24-March-2008 DATA1 1000
24-March-2008 DATA2 500
24-March-2008 DATA3 5
23-March-2008 DATA1 1200
23-March-2008 DATA2 400
23-March-2008 DATA3 13
22-March-2008 DATA1 900
...

In version 3.0, I was able to set up a variable for a specific date before scraping the file

session.setVariable ("DATE1", "24-March-2008");
session.scrapeFile ("SCRAPE");

and then use the variable in the extractor pattern:

~#DATE1#~ DATA1 ~@VALUE1@~
~#DATE1#~ DATA2 ~@VALUE2@~
~#DATE1#~ DATA3 ~@VALUE3@~

There's about 100 different dates and this was quite helpful because otherwise the extractor pattern would time out.

However, this doesn't seem to be working anymore in version 4.0. Do I have to re-think how I do this scrape, or is there something I'm missing due to the upgrade?

Thanks!

Can't get 3.0 scrape to work in 4.0

Robert,

You're right on all fronts. Your solution would be my recommendation and, yeah, it would be more expensive vs. how you were doing it. I don't know the details of why but I'm told that by removing that functionality in later versions improved the overall performance of all extractor patterns.

If it makes sense to do so, you may want to take advantage of the feature under the Advanced tab for an extractor pattern that says, "This extractor pattern will be invoked manually from a script". The advantage would be that you get to control when you use that extractor pattern rather than having it fall in sequence with the other extractor patterns.

Use it in conjunction with the [url=http://screen-scraper.com/support/docs/api_documentation.php#extractData]extractData[/url] method.

-Scott

Can't get 3.0 scrape to work in 4.0

One scenario:

There is a dropdown with a bunch of years in it. I need to get the code for the specific year I'm looking for. So I use

And it gives me the code for the year I'm looking for.

Without this technique I have to match on all options and write a script to compare the value of the option text with the value I'm looking for, then get the code from there. So instead of one simple pattern I have a more broadly matching pattern and a script. Which seems more expensive.

Can't get 3.0 scrape to work in 4.0

Robert,

I'd have to see an example to give you a suggestion for a workaround.

-Scott

Can't get 3.0 scrape to work in 4.0

I just re-read this -- this means we can't use a variable as a portion of an extraction pattern match? That impacts quite a few of my scrapes -- I occasionally use a taxyear as a match so that I get the right data extracted.

What's the best way to rework a scrape that uses that sort of matching?

:(

Can't get 3.0 scrape to work in 4.0

Scott,

I kind of assumed that this might be the case, but was looking for confirmation so I could stop wondering about what I was doing wrong.

On the plus side the fix wasn't as bad as I had expected. The coping/pasting of extractor patterns option is a much appreciated feature. :D

Thanks for the quick reply,
Joshua

Can't get 3.0 scrape to work in 4.0

Joshua,

I'm sorry to say, you've been making use of an undocumented feature that we removed due to performance reasons. We're assuming that few people are approaching their extractor patterns within scrapeable files in this way and since it impacts the performance of every scraping session, we decided to remove the feature.

I'm sorry that this will require a rewrite of your scraping session. But, on the flip side you may notice an improvement in the performance along with everyone else. :)

Thanks,
Scott

I was looking at the v4.5

I was looking at the v4.5 relase notes, and it looks like functionality is back in. If so, can I assume it's here to stay this time?

I suppose so :P I think the

I suppose so :P I think the lead programmer put that back in and didn't tell most of us. For the performance issues noted in this thread, it still may be a good idea to not rely on it too frequently, but as a feature, it is extremely nice :)