How to test extractor pattern for web pages too big for scrapeable file to display?

(This post appears normal in edit mode, but not is display mode. It contains HTML.)

I can't get screen-scraper to return a DataSet that contains data I am trying to extract. First of all, the scraping session won't display the data I need. I allow it display an unlimited number of lines by blanking out the max line count. But it says: "The response exceeded the maximum length and was truncated. If you'd like to view the full response, click the "Display Response in Browser" button, then view the source in your web browser." at the bottom of the screen.

I see the following in the browser. I want to extract the date and time, in this case it's 1:15PM 1/21/10.

<td style='display:none;'><td  colspan=2 class='s4'><td style='display:none;'><td  colspan=4 class='s5'>1:15PM 1/21/10<td style='display:none;'><td style='display:none;'><td style='display:none;'><td  class='s6'>

I set up an extractor pattern that looks like this:

<td style='display:none;'><td  colspan=2 class='s4'><td style='display:none;'><td  colspan=4 class='s5'>~@DateTimeStamp@~<td style='display:none;'><td style='display:none;'><td style='display:none;'><td  class='s6'>

It returns nothing. I set up the same extractor pattern to extract data near the top of what displayed in the browser, and it returned data OK. But I can't test out the extactor pattern because I can't visually see how screen-scraper adjusts the HTML, because it won't display the whole page in the "Last Response" tab of the scrapeable file.

Any idea why it doesn't match? How do you test extractors patterns for web pages that are too large to display in a scrapeable file?

Gary Frank on 01/21/2010 at 3:58 pm

screen-scraper support for licensed users

If you click "display

If you click "display response in browser" and then view the source of the page that pops up, it will include all of the tidying that screen-scraper has done to the HTML so you should be able to make your extractor from that.

jason on 01/21/2010 at 5:59 pm

"display response in browser" button WORKS

Yes, that solved the problem. In the scrapeable file, I clicked the Last Response tab, then clicked the "display response in browser" button, and the ENTIRE source of the page appeared as it is after screen-scraper cleans it up. I configured the extractor pattern according to what I saw there and voila! the scrapeble file found what I was looking for and returned it to the .NET program.

Gary Frank on 01/22/2010 at 9:58 am

Search

Community

screen-scraper

User login

How to test extractor pattern for web pages too big for scrapeable file to display?

If you click "display

"display response in browser" button WORKS