monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermittent user process tracking with monit


From: sven falempin
Subject: Re: intermittent user process tracking with monit
Date: Tue, 17 Sep 2013 09:11:16 -0400

check program FOO with path BAR
 
problem solved


On Tue, Sep 17, 2013 at 5:22 AM, Sean Penticoff <address@hidden> wrote:
Hi,
Let me take a moment and try and describe what it is I'm trying to do in case my tack is all wrong.
We have several systems that process data for users. The programs the users run all run from a shared space and run in user space at the users discretion.  I would like to use monit to alert when one of these processes is started and have it track the memory and cpu usage, further alerting on a condition where cpu or mem of that process exceeds a certain threshold (and possibly renicing it via some script)
I've currently set up alerts like this:
check process process1
    matching "process1"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process2
    matching "process2"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process3
    matching "process3"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert


...and it goes on for another dozen or so processes

This "works" but is not ideal
what would be ideal is more along the lines of
check process process1
    matching "process1"
    alert on statechange  (basically ignore the fact this process is not running but let me know when it starts and ends [i.e alert on state a change] and monitor it when it is running)
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert

Also we are using m/monit and every process on every machine that is NOT running shows up as a hit against overall health
i.e.
under the host status:
Status  10 out of 27 services are available

and on the main dashboard:

×[Sep 16 2013 15:59:47] Host myhost.example.com reported a problem with process1: process is not running
×[Sep 16 2013 15:59:44] Host myhost.example.com reported a problem with process2: process is not running
×[Sep 16 2013 15:59:40] Host myhost.example.com reported a problem with process3: process is not running
×[Sep 16 2013 15:59:35] Host myhost.example.com reported a problem with process4: process is not running
multiplied by 20+ hosts
you get the idea.

The fact that the process isn't running is never a problem and I would like to reflect that somehow and also be able to have some insight into whats running where.

Another thing I would really like to be able to do is pass args in the alert emails

i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s
is run I'd be tickled if I could get  "-t foo -o bar -cfg process1.cfg -v -X -s"  (or even the entire content of monit procmatch) into the alert somehow

I've only had this up and running for about a month and monit has saved my bacon on filesystem checks and dead services several times. Just wanting to do a bit more than the system side of things with it.


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general



--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail 
/\ 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]