sutil

Overview

The sutil class provides general functions used to manipulate and work with extracted data. It also allows you to get information regarding screen-scraper such as its memory usage or version.

Images

Overview

In the course of a scrape it you might want to gather images associated with the other information being gathered. These methods are provided to not only download the images but to gather size information and resize to your desired size.

These methods are only available to enterprise edition users.

getImageHeight

int sutil.getImageHeight ( String imagePath ) (enterprise edition only)

Description

Get the height of an image.

Parameters

  • imagePath File path to the image, as a string.

Return Values

Returns the height in pixels of the image file, as an integer. If the file doesn't exist or is not an image an error will be thrown and -1 will be returned.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

Examples

Write Image Height to Log

 // Output the height of the image to the log.
 session.log( "Image height: " + sutil.getImageHeight( "C:/my_image.jpg" ) );

getImageWidth

int sutil.getImageWidth ( String imagePath ) (enterprise edition only)

Description

Get the width of an image.

Parameters

  • imagePath File path to the image, as a string.

Return Values

Returns the width in pixels of the image file, as an integer. If the file doesn't exist or is not an image an error will be thrown and -1 will be returned.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

Examples

Write Image Width to Log

 // Output the width of the image to the log.
 session.log( "Image height: " + sutil.getImageWidth( "C:/my_image.jpg" ) );

resizeImage

Overview

Internally, only one function is used to resize all images; however, to facilitate the resizing of images, we have provided you with three methods. Each method will help you specify what measurement is most important (width or height) and whether the image should retain its aspect ratio.

  1. resizeImageFixHeight() [sutil] - Resize image, retaining aspect ratio, based on specified height.
  2. resizeImageFixWidth() [sutil] - Resize image, retaining aspect ratio, based on specified width.
  3. resizeImageFixWidthAndHeight() [sutil] - Resize image to a specified size (will not check aspect ratio).

resizeImageFixHeight

void sutil.resizeImageFixHeight ( String originalFile, String newFile, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image, retaining aspect ratio, based on specified height.

Parameters

  • originalFile File path of the image to be resized, as a string.
  • newFile File path when the new image should be created, as a string.
  • newHeightSize The height of the resized image in pixels, as a integer.
  • deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

Examples

Resize Image to Specified Height

 // Resizes a JPG to 100 pixels high, maintaining the
 // aspect ratio. After the image is resized, the original
 // will be deleted.

 sutil.resizeImageFixHeight( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, true );

resizeImageFixWidth

void sutil.resizeImageFixWidth ( String originalFile, String newFile, int newWidthSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image, retaining aspect ratio, based on specified width.

Parameters

  • originalFile File path of the image to be resized, as a string.
  • newFile File path when the new image should be created, as a string.
  • newWidthSize The width of the resized image in pixels, as a integer.
  • deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

Examples

Resize Image to Specified Width

 // Resizes a JPG to 100 pixels wide, maintaining the
 // aspect ratio. After the image is resized, the original
 // will be deleted.

 sutil.resizeImageFixWidth( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, true );

resizeImageFixWidthAndHeight

void sutil.resizeImageFixWidth ( String originalFile, String newFile, int newWidthSize, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image to a specified size.

Parameters

  • originalFile File path of the image to be resized, as a string.
  • newFile File path when the new image should be created, as a string.
  • newWidthSize The width of the resized image in pixels, as a integer.
  • newHeightSize The height of the resized image in pixels, as a integer.
  • deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

This method can cause distortions of the image if the aspect ratio of the original and target images are different.

Examples

Resize Image to Specified Size

 // Resizes a JPG to 100x100 pixels.
 // After the image is resized, the original
 // will be deleted.

 sutil.resizeImageFixWidthAndHeight( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, 100, true );

DecodedImage

Overview

To be used in conjunction with the ImageDecoder class.

This class represents decoded images. The objects can be queried for the text that was in the image, as well as any error that occurred while the image was being decoded. When the returned text is incorrect, there is a method that can be used to report it as bad. This can be used for sites like decaptcher.com, where refunds are given for incorrectly interpreted images.

getError

String getError ( )

Description

Gets any error message, or returns null if there was no error

Parameters

This method takes no parameters

Return Value

The error message returned

Error messages

  • OK Nothing went wrong
  • BALANCE_ERROR Insufficient funds with paid service
  • NETWORK_ERROR General network error (timeout, lost connection, server busy, etc...)
  • INVALID_LOGIN Credentials are invalid
  • GENERAL_ERROR General error, possibly image was bad or the site couldn't resolve it. See the error message for details
  • UNKNOWN Unknown error

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Convert an image to text

 import com.screenscraper.util.images.*;

 // Assuming an ImageDecoder was created in a different location and saved in "IMAGE_DECODER"
 ImageDecoder decoder = session.getVariable("IMAGE_DECODER");
 DecodedImage result = decoder.decodeFile("someFile.jpg");

 if(result.wasError())
 {
   session.logWarn("Error converting image to text: " + result.getError());
 }
 else
 {
   session.log("Decoded Text: " + result.getResult());
 }

 // If the result was bad
 result.reportAsBad();

getResult

Object getResult ( )

Description

Gets the result from decoding the image. Most likely this will be a String, but each implementation could return a specific object type.

Parameters

This method takes no parameters

Return Value

The text extracted from the image, or null if there was an error

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Convert and image to text

 import com.screenscraper.util.images.*;
 
 // Assuming an ImageDecoder was created in a different location and saved in "IMAGE_DECODER"
 ImageDecoder decoder = session.getVariable("IMAGE_DECODER");
 DecodedImage result = decoder.decodeFile("someFile.jpg");
 
 if(result.wasError())
 {
   session.logWarn("Error converting image to text: " + result.getError());
 }
 else
 {
   session.log("Decoded Text: " + result.getResult());
 }

 // If the result was bad
 result.reportAsBad();

reportAsBad

void reportAsBad ( )

Description

Handles an incorrectly resolved image. Some types of decoders won't have anything here

Parameters

This method takes no parameters

Return Value

This method returns void.

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Convert and image to text

 import com.screenscraper.util.images.*;

 // Assuming an ImageDecoder was created in a different location and saved in "IMAGE_DECODER"
 ImageDecoder decoder = session.getVariable("IMAGE_DECODER");
 DecodedImage result = decoder.decodeFile("someFile.jpg");

 if(result.wasError())
 {
   session.logWarn("Error converting image to text: " + result.getError());
 }
 else
 {
   session.log("Decoded Text: " + result.getResult());
 }

 // If the result was bad
 result.reportAsBad();

wasError

String wasError ( )

Description

Returns true if there was an error, false otherwise. Also returns false if the image has not been resolved yet

Parameters

This method takes no parameters

Return Value

True if there was an error, false otherwise

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Convert and image to text

 import com.screenscraper.util.images.*;

 // Assuming an ImageDecoder was created in a different location and saved in "IMAGE_DECODER"
 ImageDecoder decoder = session.getVariable("IMAGE_DECODER");
 DecodedImage result = decoder.decodeFile("someFile.jpg");

 if(result.wasError())
 {
   session.logWarn("Error converting image to text: " + result.getError());
 }
 else
 {
   session.log("Decoded Text: " + result.getResult());
 }

 // If the result was bad
 result.reportAsBad();

ImageDecoder

Overview

Class to convert images to text for interacting with CAPTCHA challenges. There are currently two implementations:

  • ManualDecoder: Creates a pop-up window for a user to enter in the text they read from the image
  • DecaptcherDecoder: Interface for the paid service decaptcher.com

When a reference to an image is passed to an instance of this class, it returns a DecodedImage object that can be queried for the resulting text, errors, and can report an image as poorly converted.

See example attached.

DecaptcherDecoder

void DecaptcherDecoder (ScrapingSession session, String username, String password, int port)
void DecaptcherDecoder (ScrapingSession session, String username, String password, String port)
void DecaptcherDecoder (ScrapingSession session, String username, String password, String port, String apiUrl)
void DecaptcherDecoder (ScrapingSession session, String username, String password, int port, String apiUrl)

Description

Requires an account with decaptcher.com.

Type of ImageDecoder in the com.screenscraper.util.images package that uses the decaptcher.com service to convert images to text. The constructor is DecaptcherDecoder(ScrapingSession session, String username, String password) or DecaptcherDecoder(ScrapingSession session, String username, String password, String apiUrl).

Parameters

  • session Name of currently running scraping session.
  • username Username used to log in to decaptcher.com service.
  • password Password used to log in to decaptcher.com service.
  • port The port given by De-captcher.com to access your account on their site.
  • apiUrl (optional) URL used to access decaptcher.com service. This setting will override the default URL.

Return Values

Returns void. If it runs into any problems accessing the decaptcher.com service an error will be thrown.

Change Log

Version Description
5.5.29a Available in all editions
5.5.40a Added the port parameter. The service now requires the correct port in order to authenticate.

Examples

Initialization script

import com.screenscraper.util.images.*;

ImageDecoder decoder;

decoder = new DecaptcherDecoder(session, "username", "password", 12345, "http://api.de-captcher.com");

session.setVariable("IMAGE_DECODER", decoder);

ManualDecoder

void ManualDecoder (ScrapingSession session)

Description

Type of ImageDecoder in the com.screenscraper.util.images package that uses a popup window prompting the user to enter the text read from an image. Useful for debugging purposes, as the input text should always be correct (so long as it is typed correctly). Helpful during testing to avoid costs associated with paid-for CAPTCHA decoding services such as decaptcher.com.

Parameters

  • session Name of currently running scraping session.

Return Values

Returns void. If it runs into any problems decoding an image an error will be thrown.

Change Log

Version Description
5.5.29a Available in all editions

Examples

Initialize script

import com.screenscraper.util.images.*;

ImageDecoder decoder;

decoder = new ManualDecoder(session);

session.setVariable("IMAGE_DECODER", decoder);

decodeFile

DecodedImage decodeFile ( String file )
DecodedImage decodeFile ( File file )

Description

Converts the image given to a DecodedImage that will handle it. Does not delete the file.

Parameters

  • file The image file

Return Value

A DecodedImage used to get the text, errors, and possibly report a result as bad.

Change Log

Version Description
5.5.29a Available in all editions.

Examples

image = decoder.decodeFile("path to the image file");

decodeURL

DecodedImage decodeURL ( String url )

Description

Converts the image at the given URL to a DecodedImage that will handle it. Temporarily saves the file in the screen-scraper root folder, but deletes it once it has been decoded. By default, this will use the scraping session's HttpClient to request the URL.

Parameters

  • url The url to the image

Return Value

A DecodedImage used to get the text, errors, and possibly report a result as bad.

Change Log

Version Description
5.5.29a Available in all editions.

Examples

DecodedImage image = decoder.decodeURL(dataRecord.get("IMAGE_URL"));

applyXPathExpression

convertDateToString

String sutil.convertDateToString ( Date date ) (professional and enterprise editions only)
String sutil.convertDateToString ( Date date, String format ) (professional and enterprise editions only)

Description

Converts the Date given to a string in a specified format, or in the "MM/dd/yyyy HH:mm:ss.SS zzz" if no format is given.

Parameters

  • date The date to convert
  • format (optional) A String representation (as a SimpleDateFormat) for the output

Return Values

A String representing the date given

Change Log

Version Description
5.5.26a Available in all editions.

Examples

// Log the current time
Date now = new Date();
session.log(sutil.convertDateToString(now, "MM/dd/yyyy HH:mm:ss zzz"));

convertHTMLEntities

void sutil.convertHTMLEntities ( String value )

Description

Decode HTML Entities.

Parameters

  • value String whose HTML Entities will be converted to characters.

Return Values

Returns string with decoded HTML entities.

Change Log

Version Description
5.0 Added for all editions.

Examples

Decode HTML Entities

 // Returns Angela's Room
 sutil.convertHTMLEntities( "Angela's Room" );

See Also

convertStringToDate

Date sutil.convertStringToDate ( String dateString, String format ) (professional and enterprise editions only)

Description

Converts a String to a Date object using the given format. If null is given as a format, "MM/dd/yyyy HH:mm:ss.SS zzz" is used

Parameters

  • dateString The date string
  • format The format of the date, following SimpleDateFormat formatting.

Return Values

The Date object matching the date given in the String, or null if it couldn't be parsed with the given format

Change Log

Version Description
5.5.26a Available in all editions.

Examples

// Convert an input value to a date for later use
Date lastUpdate = sutil.convertStringToDate(session.getVariable("LAST_RUN_DATE"), "yyyy-MM-dd");

if(lastUpdate == null)
{
  session.logError("No last run specified, stopping scrape");
  session.stopScraping();
}

convertUTFWhitespace

String sutil.convertUTFWhitespace (String input ) (enterprise edition only)

Description

Replaces the UTF variants on whitespace with a regular space character.

Parameters

  • input The input string.

Return Values

Returns the converted string.

Change Log

Version Description
6.0.55a Available in all editions.

Examples

Tidying a string from a site that has non-uniform ways of returning strings.

    // useful when tidying a string
    String cleanedInput = sutil.convertUTFWhitespace(input);
    cleanedInput = cleanedInput.replaceAll("\\s{2,}", " ").trim();

dateIsWithinDays

boolean sutil.dateIsWithinDays ( Date date1, Date date2, int days ) (professional and enterprise editions only)

Description

Checks to see if one date is within a certain number of days of another.

Parameters

  • date1 The first date.
  • date2 The second date.
  • days The maximum number of days that can be between the two dates.

Return Values

  • True if the dates are close than or the number of days apart, false otherwise.

Change Log

Version Description
5.5.13a Available in all editions.

Examples

Check the proximity of one date to another

date1 = sutil.convertStringToDate( "2012-02-15", "yyyy-MM-dd" );
date2 = sutil.convertStringToDate( "2012-02-24", "yyyy-MM-dd" );

days = 5;
session.log( "First date is within 5 days of second date: " + sutil.dateIsWithinDays( date1, date2, days ) );

days = 15;
session.log( "First date is within 15 days of second date: " + sutil.dateIsWithinDays( date1, date2, days ) );(

equalsIgnoreCase

boolean sutil.equalsIgnoreCase ( String stringOne, String stringTwo )

Description

Compare two strings ignoring case.

Parameters

  • stringOne First string.
  • stringTwo Second string.

Return Values

Returns true if the values of the two strings are equal when case is not considered; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Compare Two Strings (Case Insensitive)

 // Compare strings without regard to case
 sutil.equalsIgnoreCase( "aBc123","ABc123" );

formatNumber

String sutil.formatNumber ( String number ) (professional and enterprise editions only)
String sutil.formatNumber ( String number, int decimals, boolean padDecimals ) (professional and enterprise editions only)

Description

Returns a number formatted in such a way that it could be parsed as a Float, such as xxxxxxxxx.xxxx. It attempts to figure out if the number is formatted as European or American style, but if it cannot determine which it is, it defaults to American. If the number is something with a k on the end, it will convert the k to thousand (as 000). It will also try to convert m for million and b for billion. It also assumes that you won't have a number like 3.123k or 3.765m, however 3.54m is fine. It figures if you wanted all three of those digits you would have specified it as 3765k or 3,765k

Parameters

  • number String containing the number.
  • decimals (optional) The number of maximum number of decimal places to include in the result. When this value is omitted, any decimals are retained, but none are added
  • padDecimals (optional) Sets whether or not to pad the decimals (convert 5.1 to 5.10 if 2 decimals are specified)

Return Values

Returns a String formatted as a phone number, such as +1 (123) 456-7890x2, or null if the input was null

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Format a scraped abbreviated number as a dollar amount

 // Format a number to two decimal places
 String dollars = sutil.formatNumber("3.75k", 2, true);
 // This would set dollars to the String "3750.00"

 // Format the amount without cents.
 String dollarsNoCents = sutil.formatNumber("3.75m");
 // This would set dollars to the String "3750000"

Format a European number to be inserted in a MySQL statement

 String number = sutil.formatNumber("3.275,10", 2, false);
 // number would now be "3275.1"

formatUSPhoneNumber

String sutil.formatUSPhoneNumber ( String number ) (professional and enterprise editions only)

Description

Converts a String to a US formatted phone number, as +1 (123) 456-7890x2. Expects a 7 digit or 10+ digit phone number. The extension is optional, and will be any digits found after an x. This allows for extensions listed as ext, x, or extension.

Parameters

  • number String containing the phone number. The only digits in this String should be the digits of the phone number.

Return Values

Returns a String formatted as a phone number, such as +1 (123) 456-7890x2, or null if the input was null

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Format a scraped phone number

 // Formats the phone number extracted
 String phone = sutil.formatUSPhoneNumber(dataRecord.get("PHONE_NUMBER"));
 
 // If the extracted value had been "13334445678 ext. 23" the returned value "+1 (333) 444-5678x23"

formatUSZipCode

String sutil.formatUSZipCode ( String zip ) (professional and enterprise editions only)

Description

Formats and returns a US style zip code as 12345-6789. If the given zip code isn't 5 or 9 digits, will log a warning, but it will put 5 digits before the - and anything else (if any) after the -

Parameters

  • zip String to format as a zip code, either 5 or 9 digits

Return Values

Zip code formatted String, such as 12345-6789 or 12345

Change Log

Version Description
5.5.26a Available in all editions.

Examples

 // Format a number to a nicer looking zip code
 String zip = sutil.formatUSZipCode(" 001011458");
 
 // zip would be "00101-1458"

getCurrentDate

String sutil.getCurrentDate ( String format )

Description

Returns the current date in a specified format, or uses the "MM/dd/yyyy HH:mm:ss.SS zzz" if null is given. Uses the session's timezone.

Parameters

  • format The format for the output string

Return Values

A String representing the date and time this method was invoked

Change Log

Version Description
5.5.26a Available in all editions.

Examples

 // Log the current time
 session.log(sutil.getCurrentDate(null));

getInstallDir

Sting sutil.getInstallDir ( )

Description

Retrieve the file path of the screen-scraper installation.

Parameters

This method does not receive parameters.

Return Values

Returns the installation directory file path, as a string.

Change Log

Version Description
5.0 Added for all editions.

Examples

Download to screen-scraper Directory

 url = "http://www.foo.com/imgs/puppy_image.gif";

 // Get installation file path
 path = sutil.getInstallDir() + "images/puppy.gif";

 // Download to screen-scraper directory
 session.downloadFile( url, path );

getMemoryUsage

int sutil.getMemoryUsage ( ) (enterprise edition only)

Description

Get memory usage of screen-scraper.

Parameters

This method does not receive any parameters.

Return Values

Returns the average percentage of memory used by screen-scraper over the past 30 seconds, as an integer.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

For tips on optimizing screen-scraper's memory usage so that it can run faster, see our FAQ on optimization.

Examples

Stop Scrape on Memory Leak

 // Stop scrape if memory is low
 if( sutil.getMemoryUsage() > 98 )
 {
     session.log( "Memory is critically low. Stopping the scraping session." );
     session.stopScraping();
 }

getMimeType

String sutil.getMimeType ( String path )

Description

Get the mime-type of a local file.

Parameters

  • path File path to the local file, as a string.

Return Values

Returns the mime-type of the file, as a string.

Change Log

Version Description
5.0 Added for all editions.

Examples

Get File Mime Type

 // Get mime-type
 sutil.getMimeType( "c:/image/puppy.gif" );

getNumRunnableScrapingSessions

int sutil.getNumRunnableScrapingSessions ( ) (enterprise edition only)

Description

Get the number of runnable scraping sessions.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of scraping sessions in this instance of screen-scraper, as a integer.

Change Log

Version Description
5.0 Added for all editions.

Examples

Get the Number of Runnable Scrapes

 // Write the number of running scrapes to the log
 session.log( "Number of Runnable Scrapes: " + sutil.getNumRunnableScrapingSessions() );

getNumRunningScrapingSessions

int sutil.getNumRunningScrapingSessions ()
int sutil.getNumRunningScrapingSessions ( String scrapingSessionName )

Description

Gets the number of scraping sessions that are currently being run.

Parameters

  • scrapingSessionName Narrows the scope to a given scraping session, if this parameter is passed in.

Return Values

An int representing the number of running scraping sessions.

Change Log

Version Description
5.5.42a Available in Enterprise edition.

Examples

session.log( "Num running scraping sessions: " + sutil.getNumRunningScrapingSessions( session.getName() ) );
if( sutil.getNumRunningScrapingSessions( session.getName() ) > 1 )
{
        session.log( "SESSION ALREADY RUNNING." );
        session.stopScraping();
        return;
}

getOptionSet

DataSet sutil.getOptionSet ( String options ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, String ignoreLabel, boolean tidyRecords ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, String[] ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, Collection<String> ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)

Description

Gets a DataSet containing each of the elements of a <select> tag. The returned DataRecords will contain a key for the text found between the tags (possibly with html tags removed), a value indicating if it was the selected option, and the value to submit for the specific option. Note that this only looks for option tags, and as such passing in text containing more than a single select tag will produce false output.

Parameters

  • options The text containing the options HTML from the select tag
  • ignoreLabels (or ignoreLabel) (optional) Text value(s) to ignore in the output set. Usually this would include the strings like "Please select a category"
  • tidyRecords (optional) Should the TEXT be tidied before being stored in the resulting DataRecords

Return Values

A DataSet with one record per option. Values extracted will be stored in
VALUE : The value the browser would submit for this option
TEXT : The text that was between the tags
SELECTED : A boolean that is true if this option was selected by default

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Search each option from a dropdown menu

 String options = dataRecord.get("ITEM_OPTIONS");
 
 // We don't want the value for "Select an option" because that doesn't go to a search results page
 DataSet items = sutil.getOptionSet(options, "Select an option", true);
 
 for(int i = 0; i < items.getNumDataRecords(); i++)
 {
   DataRecord next = items.getDataRecord(i);
   session.setVariable("ITEM_VALUE", next.get("VALUE"));
   session.log("Now scraping results for " + next.get("TEXT"));
   session.scrapeFile("Search Results");
 }

getRadioButtonSet

DataSet sutil.getRadioButtonSet ( String buttons, String buttonName ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, String ignoreLabel ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, Collection<String> ignoreLabels ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, Collection<String> ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)

Description

Gets all the options from a radio button group. The values are returned in a data record. Any labels that are to be ignored will not be included in the returned set. Not all buttons have a label, as radio buttons do not require a label, and it would be difficult to know in a regular expression exactly what to extract as the label unless there is a label tag.

Parameters

  • buttons The text containing the buttons
  • buttonName The name of the buttons that should be grabbed, as a Regex pattern
  • ignoreLabels (or ignoreLabel) (optional) Any labels that should be excluded from the resulting set
  • tidyRecords (optional) Should the TEXT be tidied before being stored in the resulting DataRecords

Return Value

DataSet containing one record for each of the extracted radio buttons. Values will be stored in
VALUE : The value the browser would submit for this radio button
TEXT : The text that represents this button, or null if no label could be found for it
SELECTED : A boolean that is true if this button was selected by default
ID : The ID of the radio button, or null if no ID was found

Change Log

Version Description
5.5.29a Available in all editions.

Examples

Search each radio button from a radio button group

 String options = dataRecord.get("RADIO_BUTTONS");
 
 // Get all the radio buttons with the name attribute "selection"
 DataSet items = sutil.getOptionSet(options, "selection");
 
 for(int i = 0; i < items.getNumDataRecords(); i++)
 {
   DataRecord next = items.getDataRecord(i);
   session.setVariable("BUTTON_VALUE", next.get("VALUE"));
   session.log("Now scraping results for " + next.get("TEXT"));
   session.scrapeFile("Search Results");
 }

getRandomReferrer

String sutil.getRandomReferrer ( )

Description

Gets a random referrer page from a list of many different search engine web sites and a few other pages.

Parameters

This method does not receive any parameters.

Return Values

Returns a random referrer URL.

Change Log

Version Description
6.0.1a Introduced for all editions.

getRandomUserAgent

String sutil.getRandomUserAgent ( )

Description

Returns a random User Agent. The list isn't closely monitored, so it may not include newer user agents, and may include extremely old ones as well.

Parameters

This method does not receive any parameters.

Return Values

Returns a random user agent.

Change Log

Version Description
6.0.1a Introduced for all editions.

getScreenScraperEdition

String sutil.getScreenScraperEdition ( )

Description

Get edition of screen-scraper instance.

Parameters

This method does not receive any parameters.

Return Values

Returns the edition name, as a string.

Change Log

Version Description
5.0 Added for all editions.

Examples

Write Version to Log

 // Write the current version to log.
 session.log("Current edition: " + sutil.getScreenScraperEdition());

getScreenScraperVersion

String sutil.getScreenScraperVersion ( )

Description

Get version of screen-scraper instance.

Parameters

This method does not receive any parameters.

Return Values

Returns the version number, as a string.

Change Log

Version Description
5.0 Added for all editions.

Examples

Write Version to Log

 // Write the current version to log.
 session.log("Current version: " + sutil.getScreenScraperVersion());

isInt

boolean sutil.isInt ( String string )

Description

Determine if the value of a string is an integer.

Parameters

  • obj Object to be tested for containing an integer.

Return Values

Returns true if the string is an integer; otherwise, it returns false. If it is passed an object that is not a string, including an integer, an error will be thrown.

Change Log

Version Description
5.0 Added for all editions.

Examples

Check String Value

 // Does the GUESTS variable contain an integer
 if ( !sutil.isInt( session.getv( "GUESTS" ) ) )
 {
     session.logWarn( "Could not get the number of guests!" );
 }

isNullOrEmptyString

boolean sutil.isNullOrEmptyString ( Object object )

Description

Determine if an object's value is null or empty.

Parameters

  • object The object whole value will be tested.

Return Values

Returns true if the value of the object is null or an empty string; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Warning for Empty Variable

 // Give warning and stop scrape if variable is empty
 if ( sutil.isNullOrEmptyString( session.getv( "NAME" ) ) )
 {
     session.log( "The NAME variable was blank." );
     session.stopScraping();
 }

isPlatformLinux

boolean sutil.isPlatformLinux ( )

Description

Determine if operating system is a Linux platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Linux; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Check Linux Platform

 url = "http://www.foo.com/imgs/puppy_image.gif";

 // Determine download location based on platform
 if ( sutil.isPlatformLinux() )
 {
     session.downloadFile( url, "/home/user/images/puppy.gif" );
 }
 else if ( sutil.isPlatformMac() )
 {
     session.downloadFile( url, "/Volumes/Documents/images/puppy.gif" );
 }
 else if ( sutil.isPlatformWindows() )
 {
     session.downloadFile( url, "c:/images/puppy.gif" );
 }

isPlatformMac

boolean sutil.isPlatformMac ( )

Description

Determine if operating system is a Mac platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Mac; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Check Mac Platform

 url = "http://www.foo.com/imgs/puppy_image.gif";

 // Determine download location based on platform
 if ( sutil.isPlatformLinux() )
 {
     session.downloadFile( url, "/home/user/images/puppy.gif" );
 }
 else if ( sutil.isPlatformMac() )
 {
     session.downloadFile( url, "/Volumes/Documents/images/puppy.gif" );
 }
 else if ( sutil.isPlatformWindows() )
 {
     session.downloadFile( url, "c:/images/puppy.gif" );
 }

isPlatformWindows

boolean sutil.isPlatformWindows ( )

Description

Determine if operating system is a Windows platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Windows; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Check Windows Platform

 url = "http://www.foo.com/imgs/puppy_image.gif";

 // Determine download location based on platform
 if ( sutil.isPlatformLinux() )
 {
     session.downloadFile( url, "/home/user/images/puppy.gif" );
 }
 else if ( sutil.isPlatformMac() )
 {
     session.downloadFile( url, "/Volumes/Documents/images/puppy.gif" );
 }
 else if ( sutil.isPlatformWindows() )
 {
     session.downloadFile( url, "c:/images/puppy.gif" );
 }

makeGETRequest

Sting sutil.makeGETRequest ( String url )

Description

Retrieve the response contents of a GET request.

Parameters

  • url URL encoded version of page request, as a string. Java provides a URLEncoder to aid in URL encoding of a string.

Return Values

Returns contents of the response, as a string.

Change Log

Version Description
5.0 Added for all editions.

This method will use any proxy settings that have been specified in the Settings dialog box.

Examples

Retrieve Page Contents

 // Returns contents resulting from
 // request to "http://www.screen-scraper.com"

 pageContents = sutil.makeGETRequest("http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+World");

makeGETRequestNoSessionProxy

String sutil.makeGETRequestNoSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings indicated in the "Settings" dialog box, if any.

Parameters

This method does not receive any parameters.

Return Values

  • urlString The URL to request, as a string.

Throws

  • java.lang.Exception If anything naughty happens.

Change Log

Version Description
6.0.6a Introduced for all editions.

makeGETRequestUseSessionProxy

String sutil.makeGETRequestUseSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings attached to the current scraping session.

Parameters

This method does not receive any parameters.

Return Values

  • urlString The URL to request, as a string.

Throws

  • java.lang.Exception If anything naughty happens.

Change Log

Version Description
6.0.6a Introduced for all editions.

makeHEADRequest

String[][] sutil.makeHEADRequest ( String url )

Description

Retrieve the response header contents.

Parameters

  • url URL encoded version of page request, as a string. Java provides a URLEncoder to aid in URL encoding of a string.

Return Values

Returns contents of the response, as a two-dimensional array.

Change Log

Version Description
5.0 Added for all editions.

This method will use any proxy settings that have been specified in the Settings dialog box..

Examples

Retrieve Page Contents

 // Log HEAD contents

 // Get head contents
 headerArray = sutil.makeHEADRequest("http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+World");

 // Loop through HEAD contents
 for (int i=0; i<headerArray.length; i++)
 {
     // Write header to log
     session.log(headerArray[i][0] + ": " + headerArray[i][1]);
 }

 /* Example Log:
 Date: Fri, 04 Jun 2010 17:18:11 GMT
 Server: Apache/2.2.3 (CentOS)
 X-Powered-By: PHP/5.1.6
 Connection: close
 Content-Type: text/html; charset=UTF-8
 */

See Also

mergeDataRecords

DataRecord sutil.mergeDataRecords ( DataRecord first, DataRecord second, boolean saveNonEmptyString ) (professional and enterprise editions only)

Description

Merges two data records by copying all values from the second record over values of the first record, and returning a new DataRecord with these values. Doesn't modify either original record

Parameters

  • first The first DataRecord, into which the values from the second record will be copied
  • second The second DataRecord, whose values will be copied into the first
  • saveNonEmptyString True if blank values should not overwrite blank values, whether the non-blank value is in the first or second record. If both records contain a value that is not blank for the same key, the value in the first record is saved and the value in the second record discarded. If false, all values in the second record will overwrite any values in the first record.

Return Values

A new DataRecord with the merged values

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Combine values from the current dataRecord with a previous one

 DataRecord previous = session.getVariable("_DATARECORD");
 
 session.setVariable("_DATARECORD", sutil.mergeDataRecords(previous, dataRecord));

nullToEmptyString

String sutil.nullToEmptyString ( Object object )

Description

Get an object in string format.

Parameters

  • object Object to be returned in string format.

Return Values

Returns an empty string if the value of the object is null; otherwise, returns the value of the toString method of the object.

Change Log

Version Description
5.0 Added for all editions.

Examples

Get String Value of Variable

 // Always Specify Suffix even if not selected
 suffix = sutil.nullToEmptyString( session.getv( "SUFFIX" ) );

parseName

Name sutil.parseName ( String name ) (pro and enterprise editions only)

Description

Attempts to parse a string to a name. The parser is not perfect and works best on english formatted names (for example, "John Smith Jr." or "Guerrero, Antonio K". This uses standard settings for the parser. To get more control over how the name is parsed, use the EnglishNameParser class.

Parameters

  • name The name to be parsed.

Return Values

Returns the parsed name, as a Name object.

Change Log

Version Description
6.0.59a Available for professional and enterprise editions.

Examples

How to use the name parser

    String nameRaw = "John Fred Doe";
    DataRecord dr = new DataRecord();

    log.debug( "Name raw: " + nameRaw );
    if( nameRaw!=null )
    {
        try
        {
            Name name = sutil.parseName( nameRaw );
            log.debug( "First name: " + name.getFirstName() );
            log.debug( "Middle name: " + name.getMiddleName() );
            log.debug( "Last name: " + name.getLastName() );
            //log.debug( "Suffix: " + name.getSuffix() );

            dr.put( "FIRST_NAME", name.getFirstName() );
            dr.put( "MIDDLE_NAME", name.getMiddleName() );
            dr.put( "LAST_NAME", name.getLastName() );
            //dr.put( "SUFFIX", name.getAllSuffixString() );
        }
        catch( Exception e )
        {
            // The parser may throw an exception if it can't
            // parse the name.  If this occurs we want to know about it.
            log.warn( "Error parsing name: " + e.getMessage() );
        }
    }

See Also

Name sutil.parseName ( String name ) (pro and enterprise editions only)

Description

Attempts to parse a string to a name. The parser is not perfect and works best on english formatted names (for example, "John Smith Jr." or "Guerrero, Antonio K". This uses standard settings for the parser. To get more control over how the name is parsed, use the EnglishNameParser class.

Parameters

  • name The name to be parsed.

Return Values

Returns the parsed name, as a Name object.

Change Log

Version Description
6.0.59a Available for professional and enterprise editions.

Examples

How to use the name parser

    String nameRaw = "John Fred Doe";
    DataRecord dr = new DataRecord();

    log.debug( "Name raw: " + nameRaw );
    if( nameRaw!=null )
    {
        try
        {
            Name name = sutil.parseName( nameRaw );
            log.debug( "First name: " + name.getFirstName() );
            log.debug( "Middle name: " + name.getMiddleName() );
            log.debug( "Last name: " + name.getLastName() );
            //log.debug( "Suffix: " + name.getSuffix() );

            dr.put( "FIRST_NAME", name.getFirstName() );
            dr.put( "MIDDLE_NAME", name.getMiddleName() );
            dr.put( "LAST_NAME", name.getLastName() );
            //dr.put( "SUFFIX", name.getAllSuffixString() );
        }
        catch( Exception e )
        {
            // The parser may throw an exception if it can't
            // parse the name.  If this occurs we want to know about it.
            log.warn( "Error parsing name: " + e.getMessage() );
        }
    }

See Also

parseUSAddress

Address sutil.parseUSAddress ( String address ) (pro and enterprise editions only)

Description

Attempts to parse a string to an address. The parser is not perfect and works best on US addresses. Most likely other address formats can be parsed with the USAddressParser class by providing different constraints in the builder. This method is here for convenience in working with US addresses.

Parameters

  • address The address to be parsed.

Return Values

Returns the parsed address, as a Address object.

Change Log

Version Description
6.0.59a Available for professional and enterprise editions.

Examples

How to use the address parser

    import com.screenscraper.util.parsing.address.Address;
   
    String addressRaw = // some address

    DataRecord dr = new DataRecord();

    try
    {
        Address address = sutil.parseUSAddress( addressRaw );
        log.debug( "Street: " + address.getStreet() );
        log.debug( "Suite or Apartment: " + address.getSuiteOrApartment() );
        log.debug( "City: " + address.getCity() );
        log.debug( "State: " + address.getState() );
        log.debug( "Zip: " + address.getZipCode() );

        // if all of these four are blank then save only the raw address
        // else save what we can
        if(
            sutil.isNullOrEmptyString( address.getStreet() )
            &&
            sutil.isNullOrEmptyString( address.getState() )
            &&
            sutil.isNullOrEmptyString( address.getCity() )
            &&
            sutil.isNullOrEmptyString( address.getZipCode() )
        )
        {
            dr.put( "ADDRESS", addressRaw );
        }
        else
        {
            dr.put( "ADDRESS", address.getStreet() );
            dr.put( "ADDRESS2", address.getSuiteOrApartment() );
            dr.put( "STATE", address.getState() );
            dr.put( "CITY", address.getCity() );
            dr.put( "ZIP", address.getZipCode() );
        }
        session.setv( "DR_ADDRESS", dr );
    }
    catch( Exception e )
    {
        // If there was a parsing error, notify so it can be dealt with
        log.warn( "Exception parsing address: " + e.getMessage() );
    }

See Also

pause

void sutil.pause ( long milliseconds ) (professional and enterprise editions only)

Description

Pause scraping session.

Parameters

  • milliseconds Length of the pause, in milliseconds.

Return Values

Returns void.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for professional and enterprise editions.

Pausing the scraping session also pauses the execution of the scripts including the one that initiates the pause.

Examples

Pause Scrape on Server Overload

 // It should be noted that a status code of 503 is not
 // always a temporary overloading of a server.

 // Check status code
 if (scrapeableFile.statusCode() == 503)
 {
     // Pause Scraping for 5 seconds
     sutil.pause( 5000 );

     // Continue/Rescrape file
     ...
 }

randomPause

void sutil.randomPause ( long min, long max ) (professional and enterprise editions only)

Description

Pauses for a random amount of time. This is also setup to stop immediately if the stop scrape button is clicked, and to allow breakpoints to be triggered while it is pausing.

Parameters

  • min The minimum duration of the pause, in milliseconds
  • max The maximum duration of the pause, in milliseconds

Return Value

Returns void.

Change Log

Version Description
5.5.29a Available in professional and enterprise editions.

Examples

Wait for between 2 and 4 seconds

 sutil.randomPause(2000, 4000);

reformatDate

String sutil.reformatDate ( String date, String dateFormatFrom, String dateFormatTo ) (professional and enterprise editions only)
String sutil.reformatDate ( String date, String dateFormatTo ) (enterprise edition only)

Description

Change a date format.

Parameters

  • date Date that is being reformatted, as a string.
  • dateFormatFrom (optional) The format of the date that is being reformated. The date format follows Sun's SimpleDateFormat.
  • dateFormatTo The format that the date is being changed to. If dateFormatFrom is being used this should also follow Sun's SimpleDateFormat. If dateFormatFrom is left off then the date format should follow PHP's date format. In the later method you can also use timestamp as the value of this parameter and it will return the timestamp corresponding to the date. Note also how PHP treats dashes and dots: "Dates in the m/d/y or d-m-y formats are disambiguated by looking at the separator between the various components: if the separator is a slash (/), then the American m/d/y is assumed; whereas if the separator is a dash (-) or a dot (.), then the European d-m-y format is assumed."

Return Values

Returns formatted date according to the specified format, as a string.

Change Log

Version Description
5.0 Moved from session to sutil.
4.5 Available for professional and enterprise editions. Unspecified source format available for enterprise edition.

The date formats are not the same for the two methods. Read carefully.

Examples

Reformat Date from Specified Format

 // Reformats the date shown to the format "2010-01-01".
 // This uses Sun's Date Formats

 sutil.reformatDate( "01/01/2010", "dd/MM/yyyy", "yyyy-MM-dd" );

Reformat Date from Unspecified Format

 // Reformats the date shown to the format "2010-01-01".
 // This uses PHP's Date Formats

 sutil.reformatDate( "01/01/2010", "Y-m-d" );

sendMail

void sutil.sendMail ( String subject, String body, String recipients ) (enterprise edition only)
void sutil.sendMail ( String subject, String body, String recipients, String attachments, String headers ) (enterprise edition only)
void sutil.sendMail ( String subject, String body, String recipients, String contentType, String attachments, String headers ) (enterprise edition only)

Description

Send an email using SMTP mail server specified in the settings.

Parameters

  • subject Subject line of the email, as a string.
  • body The content of the email, as a string.
  • recipients Comma-delimited list of email address to which the email will be sent, as a string.
  • contentType The content type as a valid MIME type.
  • attachments Comma-delimited list of local file paths to files that should be attached, as a string.
    If you do not have any attachments the value of null should be used.
  • headers Tab-delimited SMTP headers to be used when sending the email, as a string. If you don't have
    any headers to send use the value null.

Return Values

Returns void. If it runs into any problems while attempting to send the email an error will be thrown.

Change Log

Version Description
6.0.35a Now supports alternate content types.
5.0 Moved from session to sutil.
4.5 Available for enterprise edition.

Examples

Send Email at End of Scrape

 // In script called "After scraping session ends"

 // Sends an email message with the parameters shown.
 String message = "The '" + session.getName() + "' scrape is now finished.";
 sutil.sendMail( "Status Report: Scrape Finished", message, "[email protected]", null, null );

sortSet

List sutil.sortSet ( Set set )
List sutil.sortSet ( Set set, boolean ignoreCase )
List sutil.sortSet ( Set set, Comparator comparator )

Description

Sorts the elements in a set into an ordered list.

Parameters

  • set The set whose elements should be sorted
  • ignoreCase (optional) True if case is irrelevant when sorting strings
  • comparator (optional) The Comparator used to compare objects in the set to determine order

Return Values

This method returns a sorted list of elements that are in the set.

Change Log

Version Description
5.5.26a Available in all editions.

Examples

Output all the values in a DataRecord in alphabetical order

 // Generally when a sorted set or map is needed, a data structure should be chosen that stores the values
 // in a sorted way, such as TreeSet or TreeMap.  However, sometimes the set or map is returned by a library
 // and may not have sorted values, although sorted values are needed.
 
 List keys = sutil.sortSet(dataRecord.keySet(), true);
 
 for(int i = 0; i < keys.size(); i++)
 {
   key = keys.get(i);
   session.log(key + " : " + dataRecord.get(key));
 }

startsWithUpperCase

boolean sutil.startsWithUpperCase ( String start, String string )

Description

Determine if one string is the start of another, without regards for case.

Parameters

  • start Value to be checked as the start, as a string.
  • string Value to be searched in, as a string.

Return Values

Returns true if string starts with start when case is not considered; otherwise, it returns false.

Change Log

Version Description
5.0 Added for all editions.

Examples

Does String Start With Another String (Case Insensitive)

 // Check for RTMP URLs
 sutil.startsWithUpperCase( "rtmp", session.getv( "URL" ) );

stringToFloat

float sutil.stringToFloat ( String str ) (professional and enterprise editions only)

Description

Parse string into a floating point number.

Parameters

  • str String to be transformed into a float.

Return Values

Returns the string's value as a floating point number.

Change Log

Version Description
5.0.1a Introduced for professional and enterprise editions.

Examples

Parse a String into a Float

 // Parse Float from String
 gpa = sutil.stringToFloat( session.getv( "GPA" ) );

stripHTML

XmlNode sutil.stripHTML (String content ) (enterprise edition only)

Description

Strips HTML from a string, replacing some tags with corresonding text-only equivalents.

Parameters

  • content The content to be stripped.

Return Values

Returns the stripped content.

Change Log

Version Description
6.0.20a Available in only the Enterprise edition.

Examples

Apply an XPath expression to the current response

    String cleanedInput = sutil.stripHTML(input)

tidyDataRecord

DataRecord sutil.tidyDataRecord ( DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)

Description

Tidies the DataRecord by performing actions based on the values of the settings map given (or getDefaultTidySettings if none is given). Each value in the record that is a string will be tidied. Keys are not modified. The record given will not be modified, but a new record with the tidied values will be returned. If no settings are given, will use the values obtained from sUtil.getDefaultTidySettings().

Parameters

  • record The DataRecord to tidy (values in the record will not be overwritten with the tidied values)
  • scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links
  • settings (optional) The operations to perform when tidying, using a Map<String, Boolean>

    The settings tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

    Map Key Default Value Description of operation performed
    trim true Trims whitespace from values
    convertNullStringToLiteral true Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
    convertLinks true Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
    removeTags true Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
    removeSurroundingQuotes true Remove quotes from values surrounded by them -- "value" becomes value
    convertEntities (professional and enterprise editions only) true Convert html entities
    removeNewLines false Remove all new lines from the text. Replaces them with a space
    removeMultipleSpaces true Convert multiple spaces to a single space, and preserve new lines
    convertBlankToNull false Convert blank strings to null literal

  • ignoreLowerCaseKeys (optional) True if values with keys containing lowercase characters should be ignored

Return Values

A new DataRecord containing all the tidied values and any values that were not Strings in the original record. The values that were Strings but were not tidied as well as the DATARECORD value will not be in the returned record.

Change Log

Version Description
5.5.26a Available in all editions.
5.5.28a Now uses a Map for the settings, rather than bit flags.

Examples

Tidy all values in an extracted DataRecord

 DataRecord tidied = sutil.tidyDataRecord(dataRecord);
 
 // Run code here to save the tidied record

tidyString

String sutil.tidyString ( String value ) (professional and enterprise editions only)
String sutil.tidyString ( String value, Map<String, Boolean> settings ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value, Map<String, Boolean> settings ) (professional and enterprise editions only)

Description

Tidies the string by performing actions based on the values of the settings map.

Parameters

  • value The String to tidy
  • settings(optional) The operations to perform when tidying, using a Map<String, Boolean>

    The tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

    Map Key Default Value Description of operation performed
    trim true Trims whitespace from values
    convertNullStringToLiteral true Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
    convertLinks true Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
    removeTags true Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
    removeSurroundingQuotes true Remove quotes from values surrounded by them -- "value" becomes value
    convertEntities (professional and enterprise editions only) true Convert html entities
    removeNewLines false Remove all new lines from the text. Replaces them with a space
    removeMultipleSpaces true Convert multiple spaces to a single space, and preserve new lines
    convertBlankToNull false Convert blank strings to null literal

  • scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links

Return Values

The tidied string

Change Log

Version Description
5.5.26a Available in all editions.
5.5.28a Now uses a Map for the settings, rather than bit flags.

Examples

Tidy a comment extracted from a website

Assuming the extracted text's HTML code was:
&nbsp;&nbsp;<a href="http://www.somelink.com">This</a> was great because of these reasons:<br />
1 - Some reason<br />
2 - Another reason<br />
3 - Final reason

 String comment = sutil.tidyString(scrapeableFile, dataRecord.get("COMMENT"));

The output text would be:

This (http://www.somelink.com) was great because of these reasons:
1 - Some reason
2 - Another reason
3 - Final reason

Run only specific operations

 Map settings = new HashMap();
 settings.put("convertEntities", true);
 settings.put("trim", true);
 String text = sutil.tidyString("&nbsp;A String to tidy", settings);

unzipFile

void sutil.unzipFile ( String zippedFile )

Description

Unzip a zipped file. Contents will appear in the same directory as the zipped file.

Parameters

  • zippedFile File path to the zipped file, as a string.

Return Values

Returns void. If a file input/output error is experienced it will be thrown.

Change Log

Version Description
5.0 Added for all editions.

Examples

Unzip File

 // Unzips contents of "c:/mydir/myzip.zip"
 // to "c:/mydir/"

 sutil.unzipFile( "c:/mydir/myzip.zip" );

writeValueToFile

void sutil.writeValueToFile ( Object value, String file, String charSet )

Description

Write to a file.

Parameters

  • value The string to be written.
  • file File path where the value should be created/written, as a string. If the file already exists it will be overwritten.
  • charSet Character set of the file, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version Description
5.0 Added for all editions.

Examples

Write To File

 // Writes "abc",123 to file myfile.csv using character set UTF-8
 sutil.writeValueToFile( "\"abc\",123", "myfile.csv", "UTF-8" );

Write To File Using Default Character Set

 // Writes "abc",123 to file myfile.csv
 // using screen-scraper's character set

 sutil.writeValueToFile("\"abc\",123","myfile.csv", null);