Parse out Duplicates?
I have a piece of code that I need to parse out.
onmouseover="ddrivetip('Herold Herold Assistant*
Jun 2009
7.17','#F5E7AF')" onmouseout="hideddrivetip()" />
I need to retrieve the data into two variables:
Company: Herold
Product: Herold Assistant*
Since "Herold" is a duplicate word, how do I go about extracting these two variables?
That one would be hard for a
That one would be hard for a human to tell is a company/product if he didn't already know. Are there any rules you can cling to like companies are only one word? Do you have a static list of possible companies?
Companies are not one word
Companies are not one word, however, I do have a list of companies that will be in the dataset.
Okay, that is possible.
Okay, that is possible. First you will need a way to compare your list to what is scraped. If it's small enough I would just save them in an array with a script at the beginning of the scrape.
String[] companies = {"ABC Corp", "XYZ Ltd", "ACME"};
session.setVariable("COMPANIES", companies);
Then you would extract the company block when you find it.
onmouseover="ddrivetip('~@COMPANY_AND_PRODUCT@~
Finally, you need a script that will check for the company in the extracted data, and this will use a lot of Sting manipulation.
// Local reference to variables
companies = session.getVariable("COMPANIES");
// Iterate array of possible companies
{
for (i=0; i
company = companies[i];
if (dataRecord.get("COMPANY_AND_PRODUCT").startsWith(company))
{
// The scraped string displays the company
foundCompany = dataRecord.get("COMPANY_AND_PRODUCT").substring(0, company.length());
product = dataRecord.get("COMPANY_AND_PRODUCT").substring(company.length()+1, dataRecord.get("COMPANY_AND_PRODUCT").length());
}
}
Of course, that's off the top of my head, and will need some refinement, but should get you on the right path.
Thank you! that worked
Thank you! that worked perfectly.