Input from CSV

This script is extremely useful because it's purpose is to enable you to read inputs in from a csv list. For Example, if you wanted to input all 50 state abbreviations as input parameters for a scrape then this script would cycle through them all. Furthermore, this script truly begins to show the power of an Initialize script as a looping mechanism.

This particular example uses a csv of streets in Bristol RI. Each street in Bristol is seperated by commas and only one street per line. The "while" loop at the bottom of the example retrieves streets one by one until the buffered reader runs out of lines. These streets are stored as a session variable named STREET and used as an input later on. Each time the buffered reader brings in a new street it blasts the last one out of the STREET session variable.

import java.io.*;

//you need to point the input file in the right direction. This is a relative path to an input folder in the location where you installed Screen-scraper.
session.setVariable("INPUT_FILE", "input/BRISTOL-STREETS.csv");

//this buffered reader gathers in the csv one line at a time. Your csv will need to be seperated into lines as well with one entity per line.
BufferedReader buffer = new BufferedReader(new FileReader(session.getVariable("INPUT_FILE")));

//because for this scrape my city was BRISTOL and my state was RI I set these as session variables to be used later as inputs.
session.setVariable("CITY", "BRISTOL");
session.setVariable("STATE", "RI");

//this is the loop that I was referring to earlier. As long as the line from the buffered reader is not null it sets the line as a session variable and //calls the "Search Results" scrapeable file.
while ( (line = buffer.readLine()) != null ){
    session.setVariable("ZIP", line);
    session.log("***Beginning zip code " + session.getVariable("ZIP"));

    session.scrapeFile("Search Results");
}

buffer.close();

Reading in from a CSV is incredibly powerful; however, it is not the only way to use a loop. For information on how to use an array for inputs please see the "Moderate Initialize -- Input from Array".

The next script (below) deals with input CSV files that have more than one piece of information per row (more than one column).

import java.io.*;

////////////////////////////////////////////
session.setVariable("INPUT_FILE", "input/streets_towns.csv");
////////////////////////////////////////////

BufferedReader buffer = new BufferedReader(new FileReader(session.getVariable("INPUT_FILE")));
String line = "";

while (( line = buffer.readLine()) != null ){
    String[] lineParts = line.split(",");

     // Set the variables with the parts from the line
    session.setVariable("CITY", lineParts[1]);
    session.setVariable("STREET", lineParts[0]);

    // Output to the log
    session.log("Now scraping city: " + session.getVariable("CITY") + " and street: " + session.getVariable("STREET"));

    // Scrape next scrapeable file
    session.scrapeFile("MyScrape--2 Search Results");
}

buffer.close();

Comments

CSV With Quotes

If you're in need of this script, you may also need to do a search for the other example of this that I found on this site which can handle an Excel csv file which has quotes around each field. That was a very good script, and should probably be included on this section of the site!

I can't seem to find that original link so I'm including the source:

String[] parseCSVLine(String line, int index, int columnsToGet){
    int START_STATE = 0;
    int FIRST_QUOTE = 1;
    int SECOND_QUOTE = 2;
    int IN_WORD = 3;
    int IN_WORD_WITHOUT_QUOTES = 4;
    int state = START_STATE;
    String word = "";
    ArrayList lines = new ArrayList();
    char[] chars = line.toCharArray();

    for (int i = 0; i < chars.length; i++){
        char c = chars[i];

        if (c == '"'){
            if (state == START_STATE){
                state = FIRST_QUOTE;
            }
            else if ((state == FIRST_QUOTE) || (state == IN_WORD)){
                state = SECOND_QUOTE;
            }
            else if (state == SECOND_QUOTE){
                word += ("" + c);
                state = IN_WORD;
            }
        }
        else if (c == ','){
            if ((state == SECOND_QUOTE) || (state == IN_WORD_WITHOUT_QUOTES)){
                state = START_STATE;

                lines.add(word);
                if (lines.size() == columnsToGet) break;
                    word = "";
            }
            else if (state == START_STATE){
                state = START_STATE;
                lines.add(word.replaceAll("\"\"", "\""));
            }
            else{
                word += ("" + c);
                state = IN_WORD;
            }
        }
        else{
            if (state == START_STATE) state = IN_WORD_WITHOUT_QUOTES;
            else if (state != IN_WORD_WITHOUT_QUOTES){
                state = IN_WORD;
                word += ("" + c);
            }
        }
    }
    if (lines.size() < columnsToGet){
        if ((state == SECOND_QUOTE) || (state == IN_WORD_WITHOUT_QUOTES))
            lines.add(word.replaceAll("\"\"", "\""));
    }
    String[] linesArray = new String[lines.size()];

    for (int i = 0; i < lines.size(); i++){
        linesArray[i] = (String) lines.get(i);
    }

    return linesArray;
}

// File from which to read.
File inputFile = new File( "C:/SS/vars.csv" );

FileReader in = new FileReader( inputFile );
BufferedReader buffRead = new BufferedReader( in );

// Read the file in line-by-line.
int index = 0;
while( ( searchTerm = buffRead.readLine() )!=null){
    // Don't read header row
    if (index>0){
        // Parse the line into an array
        line = parseCSVLine(searchTerm, index, 5);

        // Get the values
        var1 = line[0];
        var2 = line[1];

        // Set the needed values as session vaiables
        session.setVariable("VAR1", var1);
        session.setVariable("VAR2", var2);

        // Scrape for those values
        session.scrapeFile("MyScrapeFile");
    }
    index++;
}

// Close up the file.
in.close();
buffRead.close();