Environment to execute shell script to scrape?

I have an essentially identical pair of SS-Pro running on XP and CentOS 4.3.

On XP a traditional DOS batch file runs a scraping session without fail.

So I installed SS-Pro on the CentOS 4.3 box using the X-windows installer.

Workbench and the modified batch file from DOS work OK so long as you revise the paths to explicitly call the SS/jre/java instead of relying on the system's java (rel 1.5.x).

SS is installed in /data/APPS/ss on the CentOS box.

If I run a terminal session, cd /data/APPS/ss and then execute my shell script, all is well.

What I would LIKE to be able to do is use CRON or AT to trigger the script, but it's totally unclear what environment stuff must be included in the shell script.

As a test, it seems logical that if my /data/APPS/ss/myScraper.sh is good, it should "test" when logged in with a "#" terminal prompt by submitting

/data/APPS/ss/myScraper.sh

What I SEE is the following
--- OUTPUT FROM RUNNING OUTSIDE /data/APPS/ss --
log4jERROR Could not read configuration file [/root/resource/conf/log4j.properties].
java.io.FileNotFoundException /root/resource/conf/log4j.properties (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java106)
at java.io.FileInputStream.(FileInputStream.java66)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java297)
at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java315)
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java444)
log4jERROR Ignoring configuration file [/root/resource/conf/log4j.properties].
Exception in thread "main" java.lang.NullPointerException
at com.screenscraper.data.DScrapingSession.setPV(DScrapingSession.java875)
at com.screenscraper.business.BScrapingSession.getPV(BScrapingSession.java621)
at com.screenscraper.util.General.validate(General.java1873)
at com.screenscraper.CommandLineScriptHandler.(CommandLineScriptHandler.java41)
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java526)
Exception in thread "main" java.lang.NullPointerException
at com.screenscraper.data.DScrapingSession.setPV(DScrapingSession.java875)
at com.screenscraper.business.BScrapingSession.getPV(BScrapingSession.java621)
at com.screenscraper.util.General.validate(General.java1873)
at com.screenscraper.CommandLineScriptHandler.(CommandLineScriptHandler.java41)
at com.screenscraper.controller.ControllerMain.main(ControllerMain.java526)
-- END OF THE NASTY OUTPUT --

I'm guess its an easy fix, or at least I hope it is. Manual execution is fine for testing. Scheduled execution is preferred for "real life" usages!!

TIA for any pointers/suggestions.

Dave Nuttall
San Antonio, TX

Environment to execute shell script to scrape?

FWIW, I have added a work-around in addition to what Todd helped confirm.

Initially, I wanted to have a PHP script be able to create a small input file to a scraper-session and then execute a scraping session without any further knowledge required by the person running the PHP script.

The part about creating the data for the scraper-session to use is trivial.
Adding a few lines that sets up the scraper-shell script is also very short/easy.

The problem comes in when using a standard Linux server (i.e. CentOS 4.3) because the web-server's default setup is hard-coded so that neither "root" nor the user/group "apache" are able to execute commands such as "at" (do this at some point in time on a one-time basis).

I'm not proud of myself yet because I'm quite sure my work-around is NOT as secure as it could and SHOULD be for real-life usage.

What I did was:
1) Create a new semi-ordinary user account, capable of logging into the system.
2) Added the new user to the apache GROUP
3) Changed the web-server's start-up user to the new user
4) Restarted the web-server
5) Tried my PHP script.

As the classic Xerox commercial once proclaimed to the monk who discovered photocopying: "It's a miracle Brother Todd!" (in the commerical it was Brother Juniper!).

The CORRECT way will be pursued by me, but I don't know if its possible without recompiling both Apache and PHP.

There is supposed to be a way to run an Apache program called "suexec" for (switch user execute) and some wierdness about FastCGI for PHP, but I want to run as much in the modular mode as possible, so I reject almost all CGI stuff whenever it suits my purposes.

In consideration for Todd's assistance on this and other issues, I will gladly try to advise on Linux integration issues...but no Xerox miracles are guaranteed! You best bet is to send me a private msg here, which will tickle my normal inbox and we'll go from there.

Public thanks to Todd for hanging in with me on this.

Best to all.
Dave Nuttall
San Antonio, TX

Environment to execute shell script to scrape?

For the benefit of those who view this posting after the fact, we were able to resolve the issue by changing the shell script like so:

#!/bin/sh
cd /data/APPS/ss
/data/APPS/ss/jre/bin/java -jar /data/APPS/ss/screen-scraper.jar -s "Fetch_hearings"

Todd

Environment to execute shell script to scrape?

"Your script looks good to me. Here are a few other possibilities to check
- Ensure that the user that's running the script has read/write permissions to everything in the directory where screen-scraper is installed."

cd /
chown -R root /data/APPS/ss
chgrp -R root /data/APPS/ss
chmod -R 777 /data/APPS/ss

"- Check the "InstallDirectory" property in "resource/conf/screen-scraper.properties" to ensure that it reflects where screen-scraper is installed."

screen-scraper.properties shows
InstallDirectory=/data/APPS/ss

I reopen a new terminal session.
From /data/APPS/ss/myScrape.sh runs fine.

cd $HOME
Execute /data/APPS/ss/myScrape.sh and you'd think it was the NY Times with another episode in the "Impeach King George" campaign! It's quite verbose! (Basically same as my first posting in this thread).

d.

Environment to execute shell script to scrape?

Hi,

Your script looks good to me. Here are a few other possibilities to check

- Ensure that the user that's running the script has read/write permissions to everything in the directory where screen-scraper is installed.
- Check the "InstallDirectory" property in "resource/conf/screen-scraper.properties" to ensure that it reflects where screen-scraper is installed.

If that doesn't help, feel free to reply back.

Kind regards,

Todd

Environment to execute shell script to scrape?

"Could you post the contents of your myScraper.sh script? My guess is that you may be using relative paths when you would need to use absolute paths for it to work correctly with cron."

Hi Todd,
The script is
- - - - - - - - - -
#!/bin/sh
/data/APPS/ss/jre/bin/java -jar /data/APPS/ss/screen-scraper.jar -s "Fetch_hearings"
- - - - - -- - - - -

The only thing that comes to mind is that it requires an absolute path to the script, but I don't grasp WHERE to find it! (the "Fetch_hearings" interpreted java script).

TIA.
d.

Environment to execute shell script to scrape?

Hi Dave,

Could you post the contents of your myScraper.sh script? My guess is that you may be using relative paths when you would need to use absolute paths for it to work correctly with cron.

Thanks much,

Todd Wilson