sUtil

Scraping Util

applyXPathExpression

parseUSAddress

Address sutil.parseUSAddress ( String address ) (pro and enterprise editions only)

Description

parseName

Name sutil.parseName ( String name ) (pro and enterprise editions only)

Description

makeGETRequestUseSessionProxy

String sutil.makeGETRequestUseSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings attached to the current scraping session.

Parameters

This method does not receive any parameters.

makeGETRequestNoSessionProxy

String sutil.makeGETRequestNoSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings indicated in the "Settings" dialog box, if any.

Parameters

This method does not receive any parameters.

getRandomUserAgent

String sutil.getRandomUserAgent ( )

Description

Returns a random User Agent. The list isn't closely monitored, so it may not include newer user agents, and may include extremely old ones as well.

Parameters

This method does not receive any parameters.

Return Values

Returns a random user agent.

getRandomReferrer

String sutil.getRandomReferrer ( )

Description

Gets a random referrer page from a list of many different search engine web sites and a few other pages.

Parameters

This method does not receive any parameters.

Return Values

Returns a random referrer URL.

convertUTFWhitespace

String sutil.convertUTFWhitespace (String input ) (enterprise edition only)

Description

Replaces the UTF variants on whitespace with a regular space character.

Parameters

  • input The input string.

Return Values

Returns the converted string.

DecodedImage

Overview

To be used in conjunction with the ImageDecoder class.

This class represents decoded images. The objects can be queried for the text that was in the image, as well as any error that occurred while the image was being decoded. When the returned text is incorrect, there is a method that can be used to report it as bad. This can be used for sites like decaptcher.com, where refunds are given for incorrectly interpreted images.

getHTTPResponseHeaderSection

String scrapeableFile.getHTTPResponseHeaderSection ( ) (professional and enterprise editions only)

Description

Gets the header section of the HTTP Response

Parameters

This method takes no parameters

Return Value

A String containing the HTTP Response Headers