Tips for how to approach scraping this site?
I'm trying to scrape this page: http://ipldata.msdlouky.org/IPLCISEntry.aspx
I have a list of property addresses in column "A" of an excel spreadsheet. Ideally, i'd like to use this spreadsheet to populate the fields on this site. However, this website requires the street number and street name be input seperately so I could, if needed, export the data into into columns "A" and "B" of an XLS spreadsheet for the street number and street name to make it easier.
There are posts about importing from files but they only include instructions for the first column only. Is it easy to grab just the street number and street name separately to populate form data?
To give you an idea of the manual version of what i'm trying to do, I:
1) visit this site: http://ipldata.msdlouky.org/IPLCISEntry.aspx
2) Input the property NUMBER into the "From" and "Number" fields. (Example: for 2200 Main St., you'd put "2200" in both of these fields).
3) Then in the "Street Name" you would put something like "Main". The other fields aren't required.
4) After pressing "Search", it comes up with a result. If there isn't a result, i need that sent to an output file.
5) If there is a result, you have to click on the number in the "application key" column to bring up the details, and I want to scrape this data and output it to a spreadsheet.
It seems fairly simple but i don't know where to start. Can anyone you offer some suggestions of templates I might be able to use?
What would be the best way to approach scraping this site?
It's tons easier to use a CSV
So you would start the scrape with a script like this, and for each row in the CSV it would request the page:
import au.com.bytecode.opencsv.*;
// Indicate the location and name of the input file
fileName = "input/addresses.csv";
File inputFile = new File(fileName);
// Read the file
if (inputFile.exists())
{
// Start the CSV reader
CSVReader reader = new CSVReader(new FileReader(fileName));
// Read all rows of the CSV
rows = reader.readAll();
// Iterate rows
int index = 0;
for (i=0; i<rows.size(); i++)
{
// Skip headers
if (index>0)
{
// Parse line
line = rows.get(i);
num = line[0];
street = line[1];
session.log("Requesting " + num + " " + street);
// Save variables
session.setv("NUM", num);
session.setv("STREET", street);
// Scrape file
session.scrapeFile("Search results");
}
index++;
}
}
else
{
session.logError("===================================");
session.logError("Cannot find input file: " + fileName);
session.logError("===================================");
session.stopScraping();
}
Does that get you started?