City/State/Zip Parsing..
I am trying to scrape and parse this address information and when it comes to a 2 word city it does not parse correctly since there are multiple spaces as shown below..
How can I change the extraction to work for both cases? (1 word or 2 word city)
~@name@~ ~@address@~ ~@city@~ ~@state@~ ~@zip@~ |
Thanks
Burlington Branch 2623 Alamance Rd. Burlington NC 27215 |
Black Mountain Outpost 313 W. State St. Black Mountain NC 28711 |
City/State/Zip Parsing..
karl,
The best way to go about this is to employ regular expressions in each of the extractor tokens. One at a time, double-click on each of the tokens "city", "state", & "zip". The "Edit Token" window should pop up (if it doesn't, highlight the text between the @'s, right-click and choose "Edit token"). In the new pop-up go under the "Regular Expression" tab and you'll see a selection of pre-installed choices. For the city token I would recommend choosing the "non-html" expression, for the state choose "State abreviation" and for the zip choose "5-digit U.S. zip code". That should do the trick to handle multiple word cities.
If you take a crash course on regular expressions you'll find that you can exercise considerably more control in your scraping session than without them. In fact, as you may have discovered, some things are not possible without them.
http://www.regular-expressions.info/
-Scott