monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: timestamp monitoring + code simplification patch


From: Martin Pala
Subject: Re: timestamp monitoring + code simplification patch
Date: Fri, 29 Nov 2002 16:22:57 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1

Christian Hopp wrote:

On Thu, 28 Nov 2002, Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:

Hi,

i wrote feature for monit to allow monitoring timestamp of file or
directory recently.


I need this feature to watch health of iPlanet Messaging server stored
process (it is critical) - this process periodicaly updates timestamp
of 3 independent state files. As soon as timestamp of any of these
files is older than expected, it signalizes, that one of tasks, that
stored deamon does failed and real hell is starting. New statement has
following syntax:


 TIMESTAMP object [operator] value [unit] [action]

And you can reuse code from the "resource" checks with this syntax.
Btw... what is your default unit?  And what about "absolute"
timestamps?
Yeah, i used some functions from resource module and it did the work easier, while lot of code was ready :) I rewrote some of them to be more general and changed resorce stuff in that sence little bit.

Default timestamp unit is a second. Absolute timestamp (e.g. 20021130120000) are not supported, maybe we can do so, but i don't see any use for it (except to extend smart cron replacement as you described bellow :)

Such functionality is not a big issue for me, but never the less it's
an interesting feature and here's my +1 vote for including it in monit.

It's +1 with me, too... even if I have no use for it now... but wait...
you can misuse for some cron stuff.  If you want to update a file
every 5 minutes... disable the alert make stop to /bin/true and start
to the prog you want to start.  Thus, you can update gfx on your www
server and that stuff... cool. (-:

(...)

Yet another thing, we should discuss the syntax. Up until know we have
only had one major statement which is the 'check name with pidfile'
and with option. This is going to be a stand-alone statement with the
same scope as the check-statement.

The syntax is right now,

CHECK name PIDFILE path ...options... (1)

I'm wondering if we should keep with the check idiom and we could also
combine this with the new filesystem test and use something like
(instead of IF TIMESTAMP and such):

CHECK DEVICE NAME ..options..
CHECK DIRECTORY NAME ..options..
CHECK FILE NAME ..options..

If we want to be "compatible" with (1) we need the following,

CHECK name [PIDFILE|DEVICE|DIRECTORY|FILE] path ...options...


and here are some examples combining it with your timestamp
functionality:

check file "/usr/iplanet/msg-ims1/config/stored.ckp"

CHECK iplanet_stored FILE "/usr/iplanet/msg-ims1/config/stored.ckp"

As decribed in my previous mail, it is better to have it as process object property (e.g. as protocol or resource test). If the timestamp fails, it means that "stored" process failed, so it isn't needed to test is alone. It has the benefite of sharing the same's process another property (mailinglist, etc.) In general syntax:

CHECK filename FILE "/foo/bar/file" options

could be useful in the event, that you want just check standalone file (for example some data file), so that if it fails, it doesn't mean any process failure.


   if timestamp  > 5 minutes  then alert martin
   if size > 10Mb then alert hauk #New option
   if deleted then alert address@hidden

check directory "/foo/directory"

CHECK foo_dir DIRECTORY "/foo/directory"

   if timestamp < 10 minutes then alert martin
   if size > 100Mb then alert #New option
   if deleted then alert address@hidden


This one probably does not need a timestamp check?

check device /dev/HDA1

CHECK mylittlebigharddisk DEVICE /dev/hda1

   if used > 100Gb then alert martin or
   if available < 100Mb then alert martin

And don't forget (I have heard of problems like this)..

     IF AVAILABLE < 1000INODES THEN ALERT MARTIN


It is very important test - i had problem with it in the past (good suggestion)


esp. mail and news servers need that option!


And what about this?

    IF ERROR THEN ALERT martin

In case the device is not there any more (devfs -> plugging out the
device), not mounted anymore, access denied...?


Christian








reply via email to

[Prev in Thread] Current Thread [Next in Thread]