Can't get the search data to appear
Hi
I am trying to scrape www.euroauctions.com but when I inspect the site in Chrome I can find the URL that provides the data for the search results, but cannot seem to get it to run on SS.
I keep getting 500 server errors.
This is the URL and it has 3 parameters: filters, page & node ID (although there appears to be two records in the filter parameter (auctionId and masterCategoryId)
I am hoping there is a straightforward solution to this problem as it looks (relatively) straightforward thereafter?
Thanks for any help you can let me have in advance.
Jason
I see the site is using
I see the site is using ReCaptcha to arrest bots. It might be something you can work with, but it's probably going to be pretty complex, and require another service for captcha resolution.
Think I'll try harder to get an xml feed
Hopefully we won't come across too many of these!
Thanks
Jason
I have now got a json feed - stumped
The feed is one long list of 1500+ records with 20+ fields per record. I have written a pattern to extract the data which works, but if I use all of the tokens then it times out (too much going on, I assume.)
Is there a better way to step through the records and save as a csv file as I need to manipulate the data prior to using it?
Or is there a better JSON to csv converter?
Thanks
Jason
We ship the org.json library
We ship the org.json library with screen-scraper, and I use it when I need to read JSON.
Quick tutorial here: https://www.tutorialspoint.com/org_json/org_json_quick_guide.htm
Here is a sample where I make a new DataRecord, and have to parse each JSON node into the DataRecord.
DataRecord d = session.getv("_DATARECORD");
JSONObject obj = new JSONObject(dataRecord.get("JSON"));
JSONArray res = obj.getJSONArray("result");
int hits = res.getInt(0);
String con = res.getString(1);
if (hits>0)
{
JSONArray sobj = new JSONArray(con);
for (int i=0; i<sobj.length(); i++)
{
DataRecord dr = new DataRecord();
JSONObject party = sobj.getJSONObject(i);
dr.put("ROLE", party.getString("PARTYTYPE"));
String filing = StringUtils.substringBefore(party.getString("FILING"), " ");
d.put("FILING_DATE", filing);
filing = StringUtils.lowerCase(filing);
filing = StringUtils.capitalize(filing);
LocalDate filingDate = LocalDate.parse(filing, datePattern);
d.put("FILING_DATE_CLEAN", filingDate);
String name = party.getString("PARTY");
name = sutil.stripHTML(name);
name = StringUtils.normalizeSpace(name);
if(!StringUtils.startsWithIgnoreCase(name, "Does "))
{
log.log(">>>Parsing name: " + name);
String address = "";
String[] parts = name.split("\\s\\d", 2);
log.logObjectByType(parts);
name = parts[0];
if (parts.length>1)
{
address = parts[1];
}
dr.put("NAME", name);
log.log("address: " + address);
if (!sutil.isNullOrEmptyString(address))
{
dr.put("ADDRESS_RAW", address);
courtRecordsUtility.addDataForTable("locations", dr, true);
}
courtRecordsUtility.addDataForTable("people", dr, true);
String attr = party.getString("ATTORNEY");
attr = sutil.stripHTML(attr);
attr = StringUtils.normalizeSpace(attr);
if (!sutil.isNullOrEmptyString(attr) && !StringUtils.equalsIgnoreCase(attr, "Pro Per"))
{
DataRecord dr2 = new DataRecord();
dr2.put("ROLE", "Attorney");
dr2.put("NAME", attr);
dr2.put("PERSON_UID",caseUid + attr);
courtRecordsUtility.addDataForTable("people", dr2, true);
}
}
}
courtRecordsUtility.addDataForTable("cases", d, true);
dm.flush();
}
Thank you for the info - might need a couple more pointers...
So I create a scrapeable file using the following URL (emailed for privacy)
The resulting response is very long - so what do I put in my '_DATARECORD' field - the whole thing or just the initial node as an example?
What is the input needed for the 'JSON' in your example here: JSONObject obj = new JSONObject(dataRecord.get("JSON"));
I think your example may be a little too high tech for me. All I am looking to do is take some of the fields from the node (images, make, model year, sale date etc.) Here is an example of one of the (1500+) nodes
The start of the api shows: {"APIKey":"The key is in here","function":"GetItems","result":"OK","message":null,"data":[{"auctionId":590,"r.......
So I assume I am looking to use:
JSONArray array = obj.getJSONArray("data");
But I could just do with a couple more tips...
Many thanks
Jason
I don't know what more to
I don't know what more to point you to than that tutorial. You can sometimes make extractor patterns and pull things from JSON like that, but if that can't get you want you want, you need to parse it--the tutorial link I sent is the easiest one I've found.