How do I ignore information that appears only occasionally?
Hello! I'm trying to scrape a [url=http://www.seetickets.com/see/event.asp?startno=120&e%7Cartist=&re%7Ceventtype=1&RE%7Ceventtype%7C1=2&RE%7Ceventtype%7C2=5&RE%7Ceventtype%7C3=6&RE%7Ceventtype%7C4=14&RE%7Ceventtype%7C5=17&RE%7Ceventtype%7C6=18&RE%7Ceventtype%7C7=19&RE%7Ceventtype%7C8=20&RE%7Ceventtype%7C9=21&RE%7Ceventtype%7C10=22&RE%7Ceventtype%7C11=23&RE%7Ceventtype%7C12=24&RE%7Ceventtype%7C13=25&RE%7Ceventtype%7C14=26&RE%7Ceventtype%7C15=27&RE%7Ceventtype%7C16=3&filler1=see&resultsperpage=20] music event listing site[/url] and seem to be having an issue when the site sometimes inserts some extra info in front of the artist's name:
My extraction code is:
~@DATARECORD@~
Normally, the html I'm interested in for the artist name would look like this:
But occasionally it has extra data before the artist name:
How do I ignore information that appears only occasionally?
Hi,
There are two ways you could handle this. The first would be to use sub-extractor patterns so that you match just the individual fields in each row. Based on the HTML you included, it looks as though this may be a good approach, though, simply because the fields are so similar. The other approach would be to use two separate extractor patterns--one for the case where the stuff doesn't appear before the artist's name, and the other for the case where it does.
Kind regards,
Todd Wilson