Re: Request for a new "script" service type

monit-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for a new "script" service type

From:	Martin Pala
Subject:	Re: Request for a new "script" service type
Date:	Wed, 22 Dec 2004 11:30:39 +0100
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910

Michel Marti wrote:

Martin Pala wrote:
1.) the example which you showed is possible to integrate with monitalready using existing file timestamp test as mediator: your scriptcan be run from cron in regular intervals (for example each 5 minutes)and in the case that everything is ok, it could touch some file (forexample "/tmp/check_myservice.ok"). This will update its timestamp,which can monit test this way:
There are several problems with this:
1. I don't (yet) have cron on this box (its an arm-based embedded devicewith limited amount of storage and RAM). I could however install cron to"fix" this.2. My monit interval is set to 30 seconds but the smallest interval incron is one minute3. My embedded device has no battery buffered clock, this means that onbootup, the clock will be set to start of epoch (1970), but later willbe synchronized using ntp. This might trigger a unnecessary restart ofthe service because monit thinks that the file has not been touchedwithin the specified time.4. Monitoring will be split across two systems (cron/monit). This mightnot be obvious for users looking at the cron-tab or monit configurationonly. Of course, this can be fixed by adding documentation tomonitrc/crontab.

The timestamp trick was meant as workaround, it seems that in your caseit realy is not practical ...

Btw. as you noted you run monit on arm-based device - how is it working?Was some modification needed to run it?

> On monit side it should be possible to set at least timeout formethod (there > could be some default value, such as 5 seconds).Agreed. And monit might also pass some information to the script usingenvironment variables (e.g. MONIT_SERVICE=<service name>, etc.).

Good point. Monit already sets several envinroment variables for'execute' action:


MONIT_EVENT
MONIT_SERVICE
MONIT_DATE
MONIT_HOST
MONIT_PROCESS_PID
MONIT_PROCESS_MEMORY
MONIT_PROCESS_CHILDREN
MONIT_PROCESS_CPU_PERCENT

Some variables (such as MONIT_SERVICE) can be reused for testing methodinterface too.

I'm not sure whether it is good to define new 'script' object. I thinkit could be sufficient to support the generic testing method interfacein all existing objects (i.e. 'process', 'device', 'host', 'file','directory'). Example syntax:
check device rootfs with path /
  if failed script "/sbin/check_lvm rootvol" with timeout 7s then alert
  if space usage > 90% then alert
  ...
---
I think this would be enough for most cases, but introduces someoverhead if trying to monitor some aspects of the system that are notcovered by monit at all. E.g. if I want to send an alert if the numberof established TCP-connections exceed a certain limit I would have to dosomething like this:
check file tcp-connections with path /dev/null
if failed script "/sbin/check_connections --max=1000" with timeout 5sthen alert

You are rigth, however this will be addressed by planned monitoredservice types extension i think:

1.) There is planned network interface service type description:http://www.tildeslash.com/monit/doc/next.php#07This should allow to test throughput, connection states, types, count,etc. It may make sense to add above check_connections example to thiscontainer.

2.) I was working on addition of 'system' service type. Monit allows todisplay system load (cpu and memory usage) already, but it is outside of'check' statements currently (it is just informational). It should beadded as regular service type i think, so it will be possible to definelimits/action rules, use dependency relationship between services (whichcan allow for example to stop non-important services under high load andstart them as the load decreases). This can add several other tests,such as total process count limit, system interrupts limit, etc. andbind it with any other service characteristic.

The method will return appropriate event type in the case offailure/passed state and event decription and monit will handle thedefined action. The timeout serves as safety for the case that themethod will be jammed.
OK, but I suggest that returning the event type and description shouldbe optional.


This should be required i think.

If the script does not return this information, monitshould assume the (new) event type "script failed".

I think this should happen only in the case that the script hung (i.e.method timeout occured). Then can monit generate the "method/scriptfailed" event.

To determine thegeneral failure/success of the script, monit should IMO look at thescripts exit code.


Yes, this looks as the best way to do it :)

What other developers think about this stuff? Should we implementsomething like this or not (this invokes the topic which we alreadyrejected in the past)?

I'm +1 to add it in some standard interface form (such modules can bewritten in any language).(however i will probably don't have time to implement it - paid workcomes first ;)



Martin

[Prev in Thread]

Current Thread

[Next in Thread]

Request for a new "script" service type, Michel Marti, 2004/12/21
- Re: Request for a new "script" service type, Martin Pala, 2004/12/21
  - Re: Request for a new "script" service type, Michel Marti, 2004/12/22
    - Re: Request for a new "script" service type, Martin Pala <=
    - Re: Request for a new "script" service type, Michel Marti, 2004/12/22
    - [PATCH] total number of service (re)starts, Michel Marti, 2004/12/22

Prev by Date: Re: Request for a new "script" service type
Next by Date: Re: Request for a new "script" service type
Previous by thread: Re: Request for a new "script" service type
Next by thread: Re: Request for a new "script" service type
Index(es):
- Date
- Thread