monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some feature notes for monit


From: Vlada Macek
Subject: Some feature notes for monit
Date: Wed, 07 Jul 2004 17:40:52 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124


Hi,

first of all I'd like to thank all the contributors for such handy and
nice piece of software monit is!

I dare to send to the list my set of ideas for monit improvement and
I'd like to know what do you guys think about it. I warn you, this is
long (tried to make it shorter though) and I really hope to get at
least some response. :-) In case I propose some feature monit already
has and I don't know about it, I feel sorry. Trivialities start the
list, more complex proposals are below.

At some moment you may think that I'm not aware enough that monit's
goal is not to be a security watching master. I agree with such
objections, nevertheless I think all possible sane features should be
discussed.

It's also the case of the current thread about using monit as the
primary startup/shutdown facility replacing the old SysV rc scripts. I
like such efforts, it's a great idea and the original purpose of monit
is IMHO not far from such employment. Once monit becomes prepared to
replace rc's, I'll try to use it that way.


### MiB, GiB, KiB... units

Although this might be controversial, I suggest using proper binary
units in the textual configuration, the manual, reports etc.


### Usability: Hashes on the monit command line

MD class hashes seem to be deprecated these days for not being strong
enough, so I use and mention SHA1 only. There are systems lacking
sha1sum utility and does not have openssl package installed neither
(for use `openssl sha1 <file>' command). Therefore I don't know how to
get the positive checksum string of my data (file) to be wired to
monitrc. Monit includes hash computation routines so I miss the
command line argument that causes monit to print the checksums of the
data from stdin and then exit.

Not having well known hash utilities (such as sha1sum) on the system
and still using the hashes in the monitrc may bring obscurity security
advantages against a cracker who's not experienced enough. This is not
main goal for this proposal! :-)

An example:

    $ monit -H < /etc/passwd
    MD5 (stdin) = 5676cffe1b85f738ad53bc8fcf4075f5
    SHA1(stdin) = a9e98b7aa950d5cca495f9c83c357360005516b2
    $


### Syntax inconsistence: CERTMD5

Why "CERTMD5 12-34-56-78-90-AB-CD-EF-12-34-56-78-90-AB-CD-EF" uses
another checksum syntax when everywhere in the monitrc the checksums
are written without dashes? This is a bit confusing. Isn't there some
additional unnecessary code in monit because of this?


### Syntax inconsistence: start/stop methods

Wouldn't it be cleaner to write `set start "/etc/init.d/foo"' instead
of `start="/etc/init.d/foo"'?
I know it's conventional now... :-)


### Static binary linking

I'd wish to have the choice to not build monit with so many shared
libraries. On my RH9 it's 14 lines of ldd output. I'd like to set up
./configure for each (or as much as possible) lib to use in .so or .a
form. I believe it's small security improvement and it also helps
administration of multiple boxes. Some of my servers where I wish to
use single monit build do not have all the libraries and devel
environment installed.


### Mailserver must listen on port 25

There is no way to specify other SMTP port than default 25 in the SET
MAILSERVER statement. I missed this under some testing conditions. I
imagine something like this:

    set mailserver first.mail.srv port 8025
    set mailserver second.mail.srv            # usual port 25


### Access time changed by checks

Does monit leave atime of checked fs objects intact? A quick grep of
sources suggests it's not. Sometimes gathering atime info could help
various usage analysis of the machine and I miss the monit option to
enforce returning atime of file to the state prior the check.


### Missing check: TIMESTAMP checks mtime only

In standard UN*X we have three timestamp infos recorded for each fs
entry. Wouldn't it be handy to have ACCESS TIMESTAMP and CHANGE
TIMESTAMP option too? More advanced filesystems may add another
timestamp values and we will be ready for it...


### Missing check: general lstat(2) variable check

One cheap and IMHO nevertheless interesting check would be to watch
for the change of entire `struct stat' (except the atime item, which
could be zeroed between comparations) returned by lstat(2) for almost
any filesystem object. Something like this:

    if changed stat then alert    # under check file, directory, device...


### Untight permission check

Currently monit offers `IF FAILED PERM(ISSION) octalnumber THEN'
check. I think sometimes we do not require exact perm value and allow
some range, i.e. check against AND or UMASK masked value somehow. Just
a thought...


### Ability to run an arbitrary command and check:
    - its return code in the expression (if retcode != 1 then alert)
    - constant and variable checksum of its stdout
    - send to stdin/expect from the stdout (like in the host check)
    - send to stdin/checksum of the stdout

There should be some timeout option with the default which kills the
process when expired. Using this, virtually everyone could write its
own checks unsupported by monit as an "external procedure". This will
be handy for e.g. checking the peripherial sensors, UPS status, etc...


### Missing check: ext2/ext3 attributes (e.g. whether the file is
still immutable)

This is a filesystem dependend check. Other filesystems certainly
offer other file attributes that could be checked too. There is ACL
too. I don't know how to handle this... Immutable attr could also be
checked by my "external procedures".


### Global resource (load avg, memory usage) check

As I got it, monit can check system load or memory usage (+ swap
space) only on per process basis. When the process is unmonitored
intentionally, the global resources are not checked! The following
example for the systemwide checks will make my wish clear:

    check systemwide
        if uptime < 5 minutes then
            alert with mail-format {
                subject: "$HOST: Server booted recently!"
            }
        if loadavg(5min) > 10.0 for 8 cycles then
            alert with mail-format {
                subject: "$HOST: Server overloaded!"
            }
        if freememory < 10% for 3 cycles then
            alert with mail-format {
                subject: "$HOST: Server runs out of swap space!"
            }

On Linux, some more interesting global resource data may be found in
/proc (/proc/stat for instance, man 5 proc). But these additional
tests could be accomplished by the "external procedure" check proposed
previously. Loadavg and freememory global tests should still be
hardwired in monit because starting an external process has another
resource requirements.

Maybe reporting the saturation level of VM and IO subsystem could also
be interesting.


### Join CHECK FILE and CHECK DIRECTORY together

Looking at the source and to the manual I didn't find much difference
between these two. Even philosophically there is no need to
differentiate it much, I think. For example, there are not defined
CHECKSUM and SIZE tests for directories. But I think it would be
interesting to add such recursive tests (checksum for the substructure
_and_ for data in all contained files). So why not to join these
checks and use CHECK FILE for the directories too?


### XML output

The "_status?format=xml" feature (used e.g. by m/monit) is great! This
way monit can be used by virtually everything remotely easily. I'm
considering to write Zope/Plone dashboard for several monit instances
similar to m/monit. I just have a few hints:

- As I noticed, currently only one kind of monit status page is in
XML. More would be great!

- Minor bug: As Content-Type, _status?format=xml returns text/plain,
should be text/xml.

- I propose to print all XML values in the most technical form for
possible later processing <uptime>43m</uptime> is directly
displayable, but <uptime>2585</uptime> would be much better!

- It may be nice to return instant systemwide resource values
mentioned once above (uptime since boot, load averages, memory usage,
IO, VM...) in XML and perhaps in HTML too.


### Multiple group membership

I don't know whether such thing would be much usefull, but monit
restricts the membership of a service to one group at most.


-------------------

Here it is. I've some more notes prepared, but it is enough for now.
Please have mercy. :-))

See ya,

Vlada Macek



Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]