Monday, April 8, 2013

monitor websites with websec & cron

Websecretary is not a new piece of software but is still the best free off-the-shelf page monitoring solution that I've been able to find.

It's really as simple as this:
  • un tar the archive, 
  • alter the config file (url.list) - add your mail addresses and the URLs would want to watch
  • run the script (./websec) - it will grab initial copies of your pages
  • run it again later & it will indicate any changes that have occurred
This instant gratification is seductive - running from cron is a little more complicated & this is obviously software which you will want to run on a scheduled basis most of the time - monitoring pages & alerting you to any changes automatically. Aside from the fact that 'cron doesn't run as you' (which I always seem to have forgotten about when I need to setup a new job)  there's a couple of things you need to do for websec in particular:

Let's set it up 
As in the man page - websec's data dir is actually ~/.websec/ - something not immediately obvious after the ease of running interactively like above. The essential components:
  • url.list - the config file
  • ignore.list - configurable stuff to ignore & not count as a change
  • archive - containing copied of the watched pages which are fetched & stored so they can be diff'ed
- need to live in in the above data directory (even if you don't have any ignore data - it needs to be there). So a recipe for for the setup is:

  • configure your websec
  • run it once so it can store initial state (in the archive directory)
  • copy all the files/dirs noted above into  ~/.websec/
  • make sure that the directory which contains the scripts (websec & webdiff) is in your PATH
reate your scheduled job. In cron, something like:

59 * * * * /full/path/to/websec > /dev/null 2>&1

or, if you want the full output for testing etc:

09 * * * * /full/path/to/websec 2>&1 | mail -s "websec run" your@email.dot.dot

Although you still need to full path to the command in there, it's smart enough to find webdiff (if in the same directory) & also to find the data dir (in this sense, cron IS 'running as you' :) )