[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Request for a new "script" service type
From: |
Martin Pala |
Subject: |
Re: Request for a new "script" service type |
Date: |
Wed, 22 Dec 2004 11:30:39 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910 |
Michel Marti wrote:
Martin Pala wrote:
1.) the example which you showed is possible to integrate with monit
already using existing file timestamp test as mediator: your script
can be run from cron in regular intervals (for example each 5 minutes)
and in the case that everything is ok, it could touch some file (for
example "/tmp/check_myservice.ok"). This will update its timestamp,
which can monit test this way:
There are several problems with this:
1. I don't (yet) have cron on this box (its an arm-based embedded device
with limited amount of storage and RAM). I could however install cron to
"fix" this.
2. My monit interval is set to 30 seconds but the smallest interval in
cron is one minute
3. My embedded device has no battery buffered clock, this means that on
bootup, the clock will be set to start of epoch (1970), but later will
be synchronized using ntp. This might trigger a unnecessary restart of
the service because monit thinks that the file has not been touched
within the specified time.
4. Monitoring will be split across two systems (cron/monit). This might
not be obvious for users looking at the cron-tab or monit configuration
only. Of course, this can be fixed by adding documentation to
monitrc/crontab.
The timestamp trick was meant as workaround, it seems that in your case
it realy is not practical ...
Btw. as you noted you run monit on arm-based device - how is it working?
Was some modification needed to run it?
> On monit side it should be possible to set at least timeout for
method (there > could be some default value, such as 5 seconds).
Agreed. And monit might also pass some information to the script using
environment variables (e.g. MONIT_SERVICE=<service name>, etc.).
Good point. Monit already sets several envinroment variables for
'execute' action:
MONIT_EVENT
MONIT_SERVICE
MONIT_DATE
MONIT_HOST
MONIT_PROCESS_PID
MONIT_PROCESS_MEMORY
MONIT_PROCESS_CHILDREN
MONIT_PROCESS_CPU_PERCENT
Some variables (such as MONIT_SERVICE) can be reused for testing method
interface too.
I'm not sure whether it is good to define new 'script' object. I think
it could be sufficient to support the generic testing method interface
in all existing objects (i.e. 'process', 'device', 'host', 'file',
'directory'). Example syntax:
check device rootfs with path /
if failed script "/sbin/check_lvm rootvol" with timeout 7s then alert
if space usage > 90% then alert
...
---
I think this would be enough for most cases, but introduces some
overhead if trying to monitor some aspects of the system that are not
covered by monit at all. E.g. if I want to send an alert if the number
of established TCP-connections exceed a certain limit I would have to do
something like this:
check file tcp-connections with path /dev/null
if failed script "/sbin/check_connections --max=1000" with timeout 5s
then alert
You are rigth, however this will be addressed by planned monitored
service types extension i think:
1.) There is planned network interface service type description:
http://www.tildeslash.com/monit/doc/next.php#07
This should allow to test throughput, connection states, types, count,
etc. It may make sense to add above check_connections example to this
container.
2.) I was working on addition of 'system' service type. Monit allows to
display system load (cpu and memory usage) already, but it is outside of
'check' statements currently (it is just informational). It should be
added as regular service type i think, so it will be possible to define
limits/action rules, use dependency relationship between services (which
can allow for example to stop non-important services under high load and
start them as the load decreases). This can add several other tests,
such as total process count limit, system interrupts limit, etc. and
bind it with any other service characteristic.
The method will return appropriate event type in the case of
failure/passed state and event decription and monit will handle the
defined action. The timeout serves as safety for the case that the
method will be jammed.
OK, but I suggest that returning the event type and description should
be optional.
This should be required i think.
If the script does not return this information, monit
should assume the (new) event type "script failed".
I think this should happen only in the case that the script hung (i.e.
method timeout occured). Then can monit generate the "method/script
failed" event.
To determine the
general failure/success of the script, monit should IMO look at the
scripts exit code.
Yes, this looks as the best way to do it :)
What other developers think about this stuff? Should we implement
something like this or not (this invokes the topic which we already
rejected in the past)?
I'm +1 to add it in some standard interface form (such modules can be
written in any language).
(however i will probably don't have time to implement it - paid work
comes first ;)
Martin