monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] Revised resource-support [aka. "proc"-support]


From: Christian Hopp
Subject: Re: [Patch] Revised resource-support [aka. "proc"-support]
Date: Mon, 12 Aug 2002 23:48:16 +0200 (CEST)

On 9 Aug 2002, Jan-Henrik Haukeland wrote:

Moin!

> First, I'm impressed, this looks good. Although I have to admit that I
> don't understand much of the sys. dependent code. Even if I appreciate
> it, this is your code so add yourself as the only author (or at least
> as the first author in the @author tag.)

I don't understand so much about software engineering and it's really
impressing to me how you got it started and how keep it together.
That's why I wanna keep you as the first author.  And if you see
something like this on publications... first one is "just" important
but last one did the work. (-:   I think you don't want to be blamed in
case it does not work. (-:

> Here are some initially comments in no particular order:

> - I think it's safely to assume that you will have to be root to use
>   the proc system. So adding something like the following in p.y in the
>   semantic action is necessary:

(...code...)

My implemented solution... I try to initialize the engine (e.g. for
Solaris there is a getuid check).  If it fails you get a non fatal
error message saying that the process checking engine is disabled.  It
gets fatal as soon as you try to use any "CPUUSAGE..." statement.

The thing is,  Linux can still use this feature as a normal user.

> - Structure suggestion. Create a directory called proc (or something)
>   and copy the sysdep files to this directory. Keep proc_interface.* in
>   the top-level dir. but maybe rename the files to monit_process.* like we
>   have done with monit_http for the http sub-system?

There is a directory monit/process having sysdep.c, common.c and
process.h inside.  Outside there is, as you advised, monit_process.c.
It has only the interface to the rest of the code, nothing else.

> > check foo-inet with pidfile /var/run/foo_daemon.pid
> >     # long version
> >     if memusage greater 10.0 for 2 cycles
> >     if memkbyte greater 2000 for 2 cycles
> >     # short version
> >     cpuusage 10.0 3
>
> - We should add actions to the statements. Like,
>
>       if cpuusage is greated than 10.0 for 3 cycles then restart
>       if memusage is greated than 10.0 for 2 cycles then stop
>       if memkbyte is greated than 90M  for 2 cycles then alert

I have defined so called "actions".  Right now there are "restart",
"stop", "alert" and "ignore".  All of them reset the cycle counter to
0.  Of course you can mention e.g. cpuusage more then once that you
get an alert if it's over 80% for 5 cycles and it stops if it's over
90% for 10 cycles.

For "ignore" you just get a log entry.

Unfortunately that option is mandatory otherwise I gonna get
shifting/reducing options.  Hey it's the first time that I have to
yacc and bison.  I am usually a heavy user of regex in high level
languages to make configs. (-:

I had to define two new alerts... "stop" and "resource".  "resource"
alert do happened when e.g. cpuusage throws and alert with the alert
statement.  In any other case "stop" or "restart" is used.

We have a new "do_stop" function in validate.c.  There I first send
the alert wait 3 seconds then stop it.  In case we might stop the mail
server. (-:


>   We could even do (I'm stretching it here, I know. Better for a later
>   version)
>
>        if cpuusage is greated than 80.0 for 4 cycles then
>           stop until load < 1.5 and then restart

Put it somewhere in todo... "loadavg" support.  It's gonna be quite
system independent.  The question is what "loadavg" to use 1, 5 or 15min?


> > 3) Zombie check environment changed... the zombie check is separated
> >    as check_process_state(p).  Thus, we can add other states and it is
> >    actually no resource.
>
> Since any user can run monit (it's one of its features) you will need
> to mask out calls to update_proc_data, check_resources and
> check_process_state, add something along the line
>  if(getuid()) then do not test.

I have masked it out with "Run.doprocess".  See above.

> > 1) Alarm integration... clear, or?
>
> ?

! I have meant that the process code has to send alerts... already
working.  Don't worry.

> > 3) Web interface support (maybe with all the nice performance statistics)
>
> Yeah, that would be nice, after everything runs okay.

Too nice to wait for it... it's running. (-:

Other changes in no particular order...

1) Mark you have your "than". (-:

2) Solaris does only check process data if is not a zombie.  /proc
gets strongly reduced for zombies.

Btw., I haven't attached any code today, because I haven't tested it
yet on Linux, not even compiling.  I am starting to get more and more
comfortable with Solaris.

I think the code is soon ready to be synced... but before that I need
help with the FreeBSD stuff.  And it needs more testing.

I have a little request for something...

can we extend the "alerts" by "reasons".  A "restart" alert is nice
but I would like to know if it has happened because it got a zombie or
because of resources or user interaction....  This reason, e.g. a
sting delivered to the alert engine, is somehow added to the body of
the alert message.  I hope the idea is somehow clear?

Bye.... and good night,

C.Hopp

-- 
Christian Hopp                                email: address@hidden
Institut für Elektrische Informationstechnik             fon: +49-5323-72-2113
Technische Universität Clausthal                         fax: +49-5323-72-3197
  pgpkey: https://www.iei.tu-clausthal.de/pgp-keys/chopp.key.asc  (2001-11-22)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]