monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] Revised resource-support [aka. "proc"-support]


From: Christian Hopp
Subject: Re: [Patch] Revised resource-support [aka. "proc"-support]
Date: Tue, 13 Aug 2002 11:02:03 +0200 (CEST)

On 13 Aug 2002, Jan-Henrik Haukeland wrote:

Hi!

> > I have defined so called "actions".  Right now there are "restart",
> > "stop", "alert" and "ignore".  All of them reset the cycle counter to
> > 0.  Of course you can mention e.g. cpuusage more then once that you
> > get an alert if it's over 80% for 5 cycles and it stops if it's over
> > 90% for 10 cycles.
> >
> > For "ignore" you just get a log entry.
> >
> > Unfortunately that option is mandatory otherwise I gonna get
> > shifting/reducing options.
>
> That's why you see languages with constructs like IF expr THEN expr
> ENDIF. But in this case what you want the parser to do is to reduce to
> the production that has the no action ending:
>
> resource_no_action -> if memkbyte is greated than 90M  for 2 cycles
> resource -> if memkbyte is greated than 90M  for 2 cycles then ACTION
>
> Adding
> %left resource_no_action
>
> Will fix the shift/reduce conflict, probably. If not, I should be able
> to fix this when I get a copy of the new grammar.

%left statement just works with "tokens" or what they call operators.
"resource_no_action" is a rule.

I made a not so nice workaround using your advice:

The new syntax is:

<resource> <maxvalue> [<maxcycles>] [ACTION <action>]

resource: a choice of CPUUSAGE, MEMUSAGE, MEMKBYTE
maxvalue: a float > 0 for .*USAGE and an integer > 0 for MEMKBYTE
cycles:   an integer > 0 for the number of cycles the level may be
          exceeded
action:   a choice of IGNORE, ALERT, RESTART, STOP.  If ACTION is
          omitted the action for this event is ALERT.

And ACTION is made a %left associative operator.  We can reuse it most
probably also for other stuff, to combine other events with "ACTION"s.

> > Hey it's the first time that I have to yacc and bison.  I am usually
> > a heavy user of regex in high level languages to make configs. (-:
>
> Yeah, but you have to admit that working with a real lexer and parser
> is very cool stuff. For instance it would be impossible to parser all
> the different variations of a monitrc with standard Perl regex.

But can you foresee that a program gets to this dimension that you
can't work with out a parser and a lexical analyzer?

> > Too nice to wait for it... it's running. (-:
>
> Now I'm really excited to see the new code!

Wait, boy, wait! (-:

> > 2) Solaris does only check process data if is not a zombie.  /proc
> > gets strongly reduced for zombies.
>
> We should probably do: if zombie, then do restart right away without
> bothering to test, or what do you think?

It doesn't do the other test it just gathers the date in one step.
After data gathering and finding out that we have a zombie we restart
it.

Wait... should we really restart???? We have to get rid of the zombie!
We can't kill zombies.  We have to get rid or notify it's parent.  If
the init script or the program itself of the zombies service is clever
it notices that the service is still running and wont restart
it... because it doesn't get stopped (or killed or however you want to
call it).  Even monit won't start it.  It is also recognizing that it
is still running.

The question is "How to kill a dead man... aehm... process?".  We
can't!  The parent(!) has to wait for it.  Or we leave it lying in the
system, cruelly removing any links to its ancestors... deleting it's
pid file. (-:

> > the code is soon ready to be synced... but before that I need help
> > with the FreeBSD stuff.  And it needs more testing.
>
> I can only test a compile on FreeBSD on sourceforge, but cannot test
> it under FreeBSD since I do not have root access. Anyway, when the
> patch is ready we could release a beta version on freshmeat (there are
> other changes done since 2.5 that justify a release), and then sit
> back and wait for the bug reports.

My idea... let's do the last bugfixings (if there are any) for the 2.5
series.  And we release 2.5.2 without the process feature.  It's gonna
declared the last stable before 3.0.  And then we start 2.9.x (alpha
stage) to get the 3.0 features inside.

> > can we extend the "alerts" by "reasons".  A "restart" alert is nice
> > but I would like to know if it has happened because it got a zombie or
> > because of resources or user interaction....  This reason, e.g. a
> > sting delivered to the alert engine, is somehow added to the body of
> > the alert message.  I hope the idea is somehow clear?
>
> I think I got it, and yeah sure this is doable.

It's gonna take a bit of changes in the code where restarts, stops,
starts happen (nearly everywhere).  Then we need witted "reasons" at
every place.

Bye,

C.Hopp



-- 
Christian Hopp                                email: address@hidden
Institut für Elektrische Informationstechnik             fon: +49-5323-72-2113
Technische Universität Clausthal                         fax: +49-5323-72-3197
  pgpkey: https://www.iei.tu-clausthal.de/pgp-keys/chopp.key.asc  (2001-11-22)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]