Formatting of Output
Comrades, I am facing a screenscraper problem, that seemeth insurmountable. I hope somebody in this vast void like ether can help me resolve it.
I am trying to harvest data from an events list that looks like this:
Sunday Jan. 20, 2008
- some bandsome venue
- some bandsome venue
- some bandsome venue
- some bandsome venue
... repeated a random #, then restarting
Sunday Jan. 21, 2008
Is it possible to harvest this in 3 collumn form ( e.g. [b]DATE, BAND, VENUE[/b]).
Currently, I am doing a 2 Col harvest with output that looks like this (this makes a lot of extra work for me in reformatting):
[b]
var1; var2[/b]
Date; NULL
BAND; Venue
BAND; Venue
If this can't be done, I'd appreciate any advise as to more easily reformat the data. Thanks for any help...Good Harvesting!!!
Formatting of Output
rubing,
We do appreciate your feedback. We have a broad audience to address so it's good to hear from all realms of experience. screen-scraper was designed to be very adaptive to whatever programming language you're comfortable with. We tend to be biased for our generalized preference of Java and, perhaps, we should diversify our samples a bit more to include additional languages.
Please post more questions or comments as you have them.
-Scott
Formatting of Output
Yes, it works great! It must have been a dumb mistake on my part. I must've missed something (although i really inspected it for typos carefully!!) SORRY!
As a novice to programming (only experience with PHP), I was confused as to what the ideal language was, b/c in one of the first tutorials you present a link to a javascript tutorial along with the http links, but then seem to suggest interpretive java instead. Little things like that can be frustrating to a novice user. I'm really just telling you so that you can improve it!! I think this is a great program and look forward to getting the pro version when i start generating some ad revenue. thank-you!
in the meantime i have been processing my screen-scraper results with php, it seems easier that way.
Formatting of Output
rubing,
I can't say why Tutorial 3 didn't work the first time you tried it. screen-scraper should generate the output file on the fly. Have you tried deleting the output file and trying it again?
Because screen-scraper is written in Java you have access to the entire API for the JRE in use. In the case of the latest version of screen-scraper, 4.0, the JRE for Windows is 1.6 and for Linux & Mac it's 1.5.
http://java.sun.com/javase/reference/api.jsp
Only occasionally would you need to know how BeanShell interprets things differently than if you were programming directly. So, for the most part what you'll need to know is basic Java programming.
-Scott
Formatting of Output
I tried the interpreted java tutorial #3 script for saving data that i said did not work. This time it worked! I think this is b/c it requires the text file it writes to, to already exist??
So, I guess you are reccomending that I read up on programing in bean shell? I think you should be a little bit more explicit about what interpreted java is and where to learn about it in your documentation.
Formatting of Output
rubing,
Depending on how the page flows go between your data points you may need to write an additional script to do the looping. We recommend that you use Interpretive Java as that is the language we use exclusively in-house and therefore should be the most documented and well supported.
Could you please point out specifically which of the Interpretive Java scripts was not working for you in the tutorial?
Thanks,
Scott
Formatting of Output
I just tried this on the new screen-scraper4 and can see that it will not work. I'm guessing i need to run some type of script that will loop as long as there is data of this format:
until it can pick up a DATE variable
What scripting language should I learn and use to do this? JavaScript? or Interpreted Java??
Formatting of Output
Scott,
So you're saying I should configure as follows?
1st extractor pattern w/ session variable:
~@DATE@~
2nd extractor pattern
[repeated random # times]
Followed by script to write after each pattern application: DATE, BAND, VENUE
I tried this set-up. However, the interpreted java for writing data you provide in the tutorial does not work. The VBScript does work, however will not accept the multiple instances of the scraper engine this setup requires, so I have not tested it. Any advice?
Formatting of Output
rubing,
I'm assuming you're saving the output as a flat CSV file. If so, you'll want to mimic a relational construct by repeating the value of the date column which would otherwise be a foreign key to the table containing the band and venue data.
So, when you come to a new date with new bands and venues beneath it store the value of the date as a session variable that you will use to write out along with each of the values of bands and venues until you get to a new date. At which time you will write over the old date value and start again.
Hope this helps,
-Scott