[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Monit 4 enhancement requests
From: |
Jan-Henrik Haukeland |
Subject: |
Re: Monit 4 enhancement requests |
Date: |
Thu, 25 Sep 2003 14:16:18 +0200 |
User-agent: |
Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Reasonable Discussion, linux) |
Martin Pala <address@hidden> writes:
> Similar case is the device - in the case that you have something
> like this:
>
> if size > 80% then alert
> if size > 99% then stop
>
> In the period between 80-90% you will receive alert for each
> monitoring cycle. You need to be alerted at 80% watermark to allow
> solve the problem before critical error will occure (extend or clear
> filesystem). You can't timeout the service after few alert cycles,
> because you need to stop the filesystem and all its dependant services
> gracefully in the emergency case.
I see your point. The same problem applies to
if cpu > 40% then alert
if cpu > 99% then stop
if mem > 80Mb then alert
if mem > 150Mb then stop
And so on. Actually I think that there are two problems here. First,
there is a need to support timeout for other events than only process
restarts. Such as suggested:
IF x {event[, event]...} WITHIN y CYCLES THEN {timeout|timeout and exec}
*BUT* secondly and more important, changes to the timeout statement
does not directly solve the problem that alerts will be sent an mass
between e.g.:
if size > 80% then alert
and
if size > 99% then stop
Especially if timeout is _not_ used. We *need* to handle this within
monit and not in the configure file. We need to implement an algorithm
in monit for each IF-TEST so only one alert is sent per test. Here we
show the algorithm for the above size tests:
boolean seen_80= false;
boolean seen_99= false;
while validate
if(size > 80%) {
if(not seen_80) {
send alert; seen_80= true;
}
} else if(size > 99%) {
if(not seen_90) {
send alert; seen_99= true;
}
} else {
seen_80= false;
seen_90= false;
}
This way, as long as the disk size grows upwards only one alert is
sent per test. When the disk size is back below 80% the flags are
reset so we can start over again. The same test should be used for cpu
and mem. Checksum and timestamp is already okay, since if there was a
change the old value is set to the new.
What do you think?
--
Jan-Henrik Haukeland
- Monit 4 enhancement requests, Jan-Henrik Haukeland, 2003/09/23
- Re: Monit 4 enhancement requests, Jan-Henrik Haukeland, 2003/09/23
- Re: Monit 4 enhancement requests, Martin Pala, 2003/09/24
- Re: Monit 4 enhancement requests, Christian Hopp, 2003/09/24
- Re: Monit 4 enhancement requests, Jan-Henrik Haukeland, 2003/09/24
- Re: Monit 4 enhancement requests, Martin Pala, 2003/09/24
- Re: Monit 4 enhancement requests, Jan-Henrik Haukeland, 2003/09/24
- Re: Monit 4 enhancement requests, Jan-Henrik Haukeland, 2003/09/24
- Re: Monit 4 enhancement requests, Martin Pala, 2003/09/25
- Re: Monit 4 enhancement requests,
Jan-Henrik Haukeland <=
- Re: Monit 4 enhancement requests, Martin Pala, 2003/09/25