monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for a new "script" service type


From: Martin Pala
Subject: Re: Request for a new "script" service type
Date: Wed, 22 Dec 2004 11:30:39 +0100
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910

Michel Marti wrote:
Martin Pala wrote:

1.) the example which you showed is possible to integrate with monit already using existing file timestamp test as mediator: your script can be run from cron in regular intervals (for example each 5 minutes) and in the case that everything is ok, it could touch some file (for example "/tmp/check_myservice.ok"). This will update its timestamp, which can monit test this way:


There are several problems with this:

1. I don't (yet) have cron on this box (its an arm-based embedded device with limited amount of storage and RAM). I could however install cron to "fix" this. 2. My monit interval is set to 30 seconds but the smallest interval in cron is one minute 3. My embedded device has no battery buffered clock, this means that on bootup, the clock will be set to start of epoch (1970), but later will be synchronized using ntp. This might trigger a unnecessary restart of the service because monit thinks that the file has not been touched within the specified time. 4. Monitoring will be split across two systems (cron/monit). This might not be obvious for users looking at the cron-tab or monit configuration only. Of course, this can be fixed by adding documentation to monitrc/crontab.

The timestamp trick was meant as workaround, it seems that in your case it realy is not practical ...

Btw. as you noted you run monit on arm-based device - how is it working? Was some modification needed to run it?


> On monit side it should be possible to set at least timeout for method (there > could be some default value, such as 5 seconds). Agreed. And monit might also pass some information to the script using environment variables (e.g. MONIT_SERVICE=<service name>, etc.).

Good point. Monit already sets several envinroment variables for 'execute' action:

MONIT_EVENT
MONIT_SERVICE
MONIT_DATE
MONIT_HOST
MONIT_PROCESS_PID
MONIT_PROCESS_MEMORY
MONIT_PROCESS_CHILDREN
MONIT_PROCESS_CPU_PERCENT

Some variables (such as MONIT_SERVICE) can be reused for testing method interface too.


I'm not sure whether it is good to define new 'script' object. I think it could be sufficient to support the generic testing method interface in all existing objects (i.e. 'process', 'device', 'host', 'file', 'directory'). Example syntax:

check device rootfs with path /
  if failed script "/sbin/check_lvm rootvol" with timeout 7s then alert
  if space usage > 90% then alert
  ...
---

I think this would be enough for most cases, but introduces some overhead if trying to monitor some aspects of the system that are not covered by monit at all. E.g. if I want to send an alert if the number of established TCP-connections exceed a certain limit I would have to do something like this:

check file tcp-connections with path /dev/null
if failed script "/sbin/check_connections --max=1000" with timeout 5s then alert

You are rigth, however this will be addressed by planned monitored service types extension i think:

1.) There is planned network interface service type description: http://www.tildeslash.com/monit/doc/next.php#07 This should allow to test throughput, connection states, types, count, etc. It may make sense to add above check_connections example to this container.

2.) I was working on addition of 'system' service type. Monit allows to display system load (cpu and memory usage) already, but it is outside of 'check' statements currently (it is just informational). It should be added as regular service type i think, so it will be possible to define limits/action rules, use dependency relationship between services (which can allow for example to stop non-important services under high load and start them as the load decreases). This can add several other tests, such as total process count limit, system interrupts limit, etc. and bind it with any other service characteristic.


The method will return appropriate event type in the case of failure/passed state and event decription and monit will handle the defined action. The timeout serves as safety for the case that the method will be jammed.

OK, but I suggest that returning the event type and description should be optional.

This should be required i think.

If the script does not return this information, monit should assume the (new) event type "script failed".

I think this should happen only in the case that the script hung (i.e. method timeout occured). Then can monit generate the "method/script failed" event.

To determine the general failure/success of the script, monit should IMO look at the scripts exit code.

Yes, this looks as the best way to do it :)



What other developers think about this stuff? Should we implement something like this or not (this invokes the topic which we already rejected in the past)?

I'm +1 to add it in some standard interface form (such modules can be written in any language). (however i will probably don't have time to implement it - paid work comes first ;)


Martin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]