Output File Formatting

Hi I am trying to format my output file the file to meet my specs.

My first problem is some of my variables have a 40-80 character limit. How would I "trim" my variable to 40 characters?
For example: out.write( dataRecord.get( "TITLE" ) + "\t" );

"Title" often exceeds 40 characters. Is there anyway to cut it to less than 40 before it is written to my .txt file?

Also the "write to file" that I use from the tutorial will by default enter a value of "null" in my .txt file if it is is not assigned. Is there any way to not write anything to my file if the variable is not assigned?

Lastly if I decideded to use a .csv file rather than .txt what do I need to change?

For Example:
out.write( dataRecord.get( "TITLE" ) + "\t" );

I am using the Basic version of screen-scraper.

Thanks for your help/patience.

- Ben

Hey thanks justin I have no

Hey thanks justin

I have no idea why it wasn't working the first time but I finally got it to work! I used some of the code from this thread: http://community.screen-scraper.com/node/2175

Now one of my fears has come true. Some of my data contains commas for example image source urls so of course it interprets those commas as separators. I've done some researching and found that generally what we do to avoid this is surround each cells data with quotes ("). I've played around with the Out.write feature and can't seem to find anything that will write some quotes to my file.

How do I go about doing this? Hopefully this will solve my problem once and for all.

thanks again for everything
Ben

Hey thanks justin I have no

Hey thanks justin

I have no idea why it wasn't working the first time but I finally got it to work! I used some of the code from this thread: http://community.screen-scraper.com/node/2175

Now one of my fears has come true. Some of my data contains commas for example image source urls so of course it interprets those commas as separators. I've done some researching and found that generally what we do to avoid this is surround each cells data with quotes ("). I've played around with the Out.write feature and can't seem to find anything that will write some quotes to my file.

How do I go about doing this? Hopefully this will solve my problem once and for all.

thanks again for everything
Ben

I'll try to answer...

Hi Ben,

I'll try to answer your questions, but I'm going to assume that you are using Java to output to your file (as per the tutorials):

1. Trimming Text: Probably the easiest way to do this is to use Java's subString functionality. If memory serves me correctly, your code would look something like this:

String fullTitle_STR = dataRecord.get( "TITLE" ); //this retrieves the full title from the datarecord
String trimmedTitle_STR = fullTitle_STR.subString(0,40); //this trims the title to 40 characters starting from the Left side

2. Handling Null values: Screenscraper includes a number of very handy utilities in the sutil functions, one of which is called nullToEmptyString (see http://community.screen-scraper.com/documentation/api/sutil/nullToEmptyString) This particular function examines the contents of a variable to see if it contains a null value; if so, it replaces it with an empty string, which sounds exactly like what you want. Your code might look something like this:

out.write(sutil.nullToEmptyString(session.getVariable("your variable name goes here")) + "\t" );

3. Txt vs. CSV output: I know that SS includes a CSVwriter utility (see http://community.screen-scraper.com/documentation/api/utilities/CsvWriter), which might be useful but I'm not familiar with it. That said, my understanding of CSV files is that they are simply text files where the data elements on each line are separated by commas (or some other symbol), so you could just simply replace the "\t" with a "," in your out.write strings. Using the above code, this would look something like:

out.write(sutil.nullToEmptyString(session.getVariable("your variable name goes here")) + "," );

HTH! FYI, I'm still hacking around with Java so if Jason or Tim reply, trust their comments above mine.

Regards,
Justin

Thanks

Thanks for the quick reply.

1. While using the code you gave me. I'm receiving the following error:

ERROR--new: An error occurred while processing the script: Items
new: The error message was: class bsh.EvalError (line 3): fullTitle_STR .subString ( 0 , 40 ) -- Typed variable declaration : Error in method invocation: Method subString( int, int ) not found in class'java.lang.String'

I've researched other methods to trim in Java I just haven't found a solution.

2. Handling Null Values Works perfectly Thanks

3. Works great but Now i'm having a new problem/question/challenge. One of my variables contains html and consequently often contains commas (,) this is making it randomly move to the next cell whenever it comes across one of these characters. Is there any way around that?

Thanks for the quick response
Ben

Oops...substring, not subString

Hi Ben,

Oops - my apologies! My memory isn't the greatest these days...I think the code should work as is if you make substring all lower-case. You can find the Java specification for the substring method at: http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#substring(int, int)

As for the comma issue, what/where/how is the CSV file being used for? (e.g., is it being opened in Excel or added to a database?) FWIW, I think you can make your delimiter (i.e., the character separating your data elements) anything you like, so you could try using a semicolon (;), a tilde (~) or even a double comma (,,) as a delimiter. The CSV file generated from Screen-scraper won't be affected by symbol you use for the delimiter, but you will need to indicate the particular delimiter to whatever application is using the CSV file on the other end. Otherwise, the receiving application won't be able to read the individual data elements because it's expecting a different symbol than what you're using.

HTH!
Justin

Hey Justin,It is part of a

Hey Justin,

It is part of a file that I am going to upload to a web server. It's a list of products i'm selling and the web pages. The server requires a CSV file so I have been using a Macro triggered by screen scraper that uses Excel to edit the document to the web-servers specs before SS uploads the file to my web site. However I thought it would be much easier and stream lined if I were able to keep the entire process within SS.

So with your idea about using other characters rather than the comma wouldn't work because I have no control over the way the server sees the file...right?

BTW the trim function you gave me works great now thanks...Is there a way to add an exception for the times when the variable is already less than 40 characters? I think that is why sometimes I get this error:
Processing script: "Items"
ERROR--new: An error occurred while processing the script: Items
new: The error message was: NullPointerException (line 5): String trimmedwebsite_STR = fullwebsite_STR .substring ( 0 , 40 ) -- Typed variable declaration : Null Pointer in Method Invocation

Again thank you for your time.

Ben

Fix for under-40 characters

Hi Ben,

Thanks for the info.
1. CSV file: Now I see why you would want to keep the commas - otherwise, the web server won't understand the CSV file that you'll be giving it...this is a tough one and you might want to ask Jason or Tim about it. But in the meantime, would it be possible for you to provide a sample URL with commas so we can see what the output looks like?

2. 40 characters: Ah, that's a good point - I had assumed that all of your output would be 40 characters or more, but we all know what "assume" means. There are 2 ways to handle this: you either use an if statement to detect cases of less-than 40 characters or you trap the NullPointer error. I'll demonstrate both and you can choose which one you want to use:
i. If statement

//a. initialize your variables
String fullTitle_STR;
String trimmedTitle_STR;
int fullTitleLength_INT = 0;
//b. Retrieve the title and check it's length
fullTitle_STR = dataRecord.get( "TITLE" ); //this retrieves the full title from the datarecord
fullTitleLength_INT = fullTitle_STR.length();
//c. Evaluate the length of the title and set the trimmed title appropriately
if (fullTitleLength_INT =>40)
{
  //if the title is equal to or greater than 40 characters, trim it...
  trimmedTitle_STR = fullTitle_STR.subString(0,40); //this trims the title to 40 characters starting from the Left side
} else {
  //...otherwise, leave the title as is since it's less than 40 characters
  trimmedTitle_STR = fullTitle_STR;
}

ii. Try-Catch statement

try{
//a. first we'll try to retrieve the title and generate a 40-character substring
  String fullTitle_STR = dataRecord.get( "TITLE" ); //this retrieves the full title from the datarecord
  String trimmedTitle_STR = fullTitle_STR.subString(0,40); //this trims the title to 40 characters starting from the Left side
} catch (NullPointerException e) {
//b. if this didn't work and the NullPointer error got thrown (meaning there are less than 40 chars)
//   we'll just set the trimmed title equal to the entire title token
  String trimmedTitle_STR = dataRecord.get( "TITLE" ); //this retrieves the full title from the datarecord
}

HTH!

Thanks

Hey thanks so much justin I was able to use the If/else statement. When you say a sample url do you mean an online version of my csv file? I don't have one online yet but i'm sure i could make that happen if you think it would help. I'm so close to not having to use a macro before i upload my file so i'm hoping i can figure this out.

I'm trying to figure out what exactly Excel is doing to my file when I change it from .txt to .csv and why i can't do the same thing manually without confusing the HTML code as comma separators.

Thanks
Ben

No problem

Hi Ben,

Glad to be of help. When I asked for a sample URL, I was curious to see what the URL that is going to be written to the CSV/text file looks like, and more specifically, the commas that were screwing up the output.

I've also seen Excel do funny things to text/CSV files sometimes when it tries to open them automatically. To bypass this, I try to import it via the File > Open menu, where I specify 'Text Files' from one of the drop-down menus. When I do this, Excel usually opens a wizard that allows me to tell it the character that's the delimiter (i.e., character that separates the data fields). FYI, once you find a setup that works, you can implement this programmatically via Visual Basic for Applications (VBA) so that all subsequent CSV/text files could be opened the same way.

HTH,
Justin

Last Idea

I had one last idea, I'm trying to use the replace function to rid the string of commas (,).

DESCRIPTION = DESCRIPTION.replaceAll(",", "");

However my result is that it will only replace commas that have leading and following spaces ( , ) most of my commas are next to a character (blah, blah)(blah,blah) etc.

Any way around this?

Thanks, Ben

RegEx might be an answer but beware...

Hi Ben,

That's weird - I've used the replaceAll to do exactly the same thing and it replaced all of the commas regardless of the character preceding/following it. One thing to be aware of is that the replaceAll method tries to match the characters exactly as you write them (including spaces), so " ," ", " and " , " are all different.

Just to confirm, your code is exactly as follows?

DESCRIPTION = DESCRIPTION.replaceAll(",", "");

(i.e., there's no space before/after the comma)

If this is the case and it's not working you might have to look at including some regular expressions (AKA RegEx) in your replaceAll method. You could try something as follows:

DESCRIPTION = DESCRIPTION.replaceAll("\\w,.?+"), "");

What this does is includes RegEx code that tells the replace all function to replace commas preceded by an alphanumeric character (\\w) that may be followed by one or more characters that might be alphanumeric or non-alphanumeric, like a space (.?+)

One thing that I'll caution you is that RegEx looks simple but is hard to do well: once it's working, it's REALLY powerful, but getting to that point may take a while... If you need more help on this, you might want to try contacting Jason or Tim as I've been bashing my head on it for about 6-9 months and I still don't completely understand it.

HTH,
Justin

Hey thanks justin I have no

Hey thanks justin

I have no idea why it wasn't working the first time but I finally got it to work! I used some of the code from this thread: http://community.screen-scraper.com/node/2175

Now one of my fears has come true. Some of my data contains commas for example image source urls so of course it interprets those commas as separators. I've done some researching and found that generally what we do to avoid this is surround each cells data with quotes ("). I've played around with the Out.write feature and can't seem to find anything that will write some quotes to my file.

How do I go about doing this? Hopefully this will solve my problem once and for all.

thanks again for everything
Ben

The easy way to wrap with

The easy way to wrap with quotes is

out.write("\"" + dataRecord.get("YOUR_DATA") + "\",");

Or I like to
// Fix format issues, and return wrapped in quotes
String fixString(String value)
{
        if (value != null)
        {
                value = sutil.convertHTMLEntities(value);
                value = value.replaceAll("\"", "\'");
                value = value.replaceAll("\\s{2,}", " ");
                value = value.trim();
        }
        return "\"" + (value==null ? "" : value) + "\"";
}

// Set name of file to write to.
outputFile = "output/" + session.getName() + ".csv";

// Set headers
// Will look for tokens of same name, using usual naming convention
String[] names = {
        "Dealer",
        "Address1",
        "Address2",
        "City",
        "State",
        "Post code",
        "Country",
        "Phone",
        "Fax"
};

// Error catching
try
{
        File file = new File(outputFile);
        fileExists = file.exists();
       
        // Open up the file to be appended to
        out = new FileWriter(outputFile, true);
        session.log("Writing data to a file");
        if (!fileExists)
        {
                // Write headers
                for (i=0; i<names.length; i++)
                {
                        out.write(names[i]);
                        if (i<names.length-1)
                                out.write(",");
                }
                out.write("\n");
        }
               
        // Write columns.
        for (i=0; i<names.length; i++)
        {
                var = names[i];
                var = var.toUpperCase();
                var = var.replaceAll("\\s", "_");
                out.write(fixString(dataRecord.get(var)));
                if (i<names.length-1)
                        out.write(",");
        }
        out.write( "\n" );

        // Close up the file
        out.close();
       
        // Add to controller
        session.addToNumRecordsScraped(1);
}

catch( Exception e )
{
        session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

Nice!!! :)

Thanks for posting this. I always learn something from your replies, esp. when they contain code samples!

Regards,
Justin

I think Justin got the right

I think Justin got the right of it ... seems that 'hacking around' is going well.