monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some feature notes for monit, volume III


From: Vlada Macek
Subject: Some feature notes for monit, volume III
Date: Wed, 14 Jul 2004 17:50:38 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124

Hi!

There was only a few responses to my previous Volume II and I'm here
with a new batch. Happy again? :-)
So here are my idea:

### Monit monitoring

Is there some way to detect that monit is successfully running? I would
like to ensure (from the cron) that the monit process does not merely
exist, but actually does the checks it should.
That way I would have a protection of cron-monit pair in both directions
(checking the cron's log timestamp from monit).

One solution that comes to my mind is for monit to touch the file
/var/run/monit.touch every cycle... monit.touch could then be checked by
`find ... -mmin +` on Linux. Another solution would be to write
"--MONITMARK--" log entry every, say, 10 cycles (or configurable time)
to the log output (whenever it's syslog or own logfile).


### Cross-pidfile mistakes

It would be nice from monit to check that the given pidfile of the
process is unique at the time of syntax checking of the controlfile.
This would avoid the configuration mistakes. Just if that's not already
being done. :-)


### LDFLAGS="-static" doc

This trick would maybe fit to the documentation. Monit is the kind of
software, that one might need to build statically and this is now
commonly known.


### Service textual comments

The monit web or e-mail output may sometimes be (temporarily) read by
nontechnical people. Attaching a comment to each service explaining what
is this item or exactly what to do in case of the failure may be handy.
What about something like this:

check file ... "/proc/mdstat"
    desc "Disk array check. In case of any failure, please contact any
admin ASAP! Telephone..."

I hope that the closing "-chanracter will be correctly found by
tokenizer in UTF-8 text. There are possible problems with text encodings
when introducing arbitrary strings to control files. The encoding should
be defined somewhere and passed to output HTML, XML and e-mail headers...


### Statistics

There are special utilities that gather statistical data from various
parts of UNIX systems. MRTG, snmpd are examples. But when I already run
monit on my server and it is periodically getting most of this data I'm
interested to become graphs possibly, isn't it unnecessary to run
another daemon just to do a duplicate get-this-valueset job?

I imagine something like this directive in the CHECK block:

    EVERY m CYCLES        # usual monit statement
    REGISTER VMUSAGE|LOADAVG(1 min)|TIMESTAMP ... EVERY n CYCLES

When the m == n, the check and the registering may both be woken up at
the same time and the compared value may be directly used for
registering (in case such property is being checked actually). When m !=
n, the value would need to be retrieved and registered at another moment.

The registered values are best stored on the disk (the data may grow
large). Special tagged log entry may be written to usual monit log
output, or it may be written to special output file as lines similar to
this:

    secs-since-1970   service-name   property-name   value

Monit should be able to return the selected portion of this file when
asked (there already is something similar -- _viewlog action) in TEXT or
XML form. Data then could be gathered by any software, even MRTG.

First, we do have to not run any special data agent such as snmpd, just
using existing gathering facility in monit. Second, in case of network
failure the usual remote gathering methods AFAIK introduce a time-gap in
statistics, they does not include any caching. This way the data are got
and written by monit at exact periods undisturbed and returned in a
bunch once when wished (no unnecessary network transfers each period).

Someday in future there may be a CHECK INTERFACE statement in monit.
After then, the registering would be even more interesting. :-)


### Refactoring of the HTTP output, unify

There is a separate set of output routines for HTML and TEXT/XML in
http/cervlet.c. I believe the simple XML output was originally added for
communication with the m/monit to pass the true/false information about
each service. The goal was not to provide an exhaustive information.

Monit currently serves the maximum status data in HTML format, which is
poor for consecutive data processing. I think it would be much better to
unify functions that read RAM structures to one skeleton and call the
tag printing routines in accordance to the format requested. This way
all formats would provide the same set of information and there is only
one code to maintain. Additionally, much of the unified printing code
may be shared by multiple services.

--- HTML would stay almost in the current form (one service per page):

    <tr>
        <td>Name</td>
        <td>apache</td>
    </tr><tr>
        <td>Status</td>
        <td><font color="#00ff00">OK</font></td>
    </tr><tr>
        <td>Process status</td>
        <td><font color="#00ff00">Running</font></td>
    </tr><tr>
        <td>Process id</td>
        <td>977</td>
    </tr><tr>
        <td>Pid file</td>    
        <td>/var/run/httpd.pid</td>
    </tr><tr>
        <td>Group</td>
        <td><font color="#0000ff">server</font></td>
    </tr>

On another page:

    <tr>
        <td>Name</td>
        <td>passwd</td>
    </tr><tr>
        <td>Status</td>
        <td><font color="#ff0000">Failure!</font></td>
    </tr><tr>
        <td>Path</td>
        <td>/etc/passwd</td>
    </tr><tr>
        <td>Monitoring mode</td>
        <td>active</td>
    </tr><tr>
        <td>Check service</td>
        <td>every 5 cycle</td>
    </tr><tr>
        <td>Associated checksum</td>
        <td>if failed db691baffdc13c2045b43e9234e6357fa8c37598(SHA1)
then alert else if recovered then alert</td>
    </tr><tr>
        <td>Associated UID</td>
        <td>if failed 0 then alert else if recovered then alert</td>
    </tr><tr>
        <td>Size</td>
        <td>46 B</td>
    </tr><tr>
        <td>Permission</td>
        <td><font>644</font></td>
    </tr><tr>
        <td>UID</td>
        <td><font color="#ff0000">500</font></td>
    </tr><tr>
        <td>GID</td>
        <td><font>0</font></td>
    </tr><tr>
        <td>Checksum</td>
        <td><font
color="#ff0000">9f55b3a33ef21ab86b7a92720e8647966217c9f3</font></td>
    </tr>

--- For humanistic text output format:

    Service: process apache
        Status: OK
        Process id: 977
        Pid file: /var/run/httpd.pid
        Process status: Running
        Group: server

    Service: file passwd
!!!     Status: Failure!
        Path: /etc/passwd
        Monitoring mode: active
        Check service: every 5 cycle
        Associated checksum: if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert
        Associated UID: if failed 0 then alert else if recovered then alert
        Size: 46 B
        Permission: 644
!!!     UID: 500
        GID: 0
!!!     Checksum: 9f55b3a33ef21ab86b7a92720e8647966217c9f3

--- For structured text (tabbed) output format:

service.process.apache.status    1
service.process.apache.pid    977
service.process.apache.pidfile    /var/run/httpd.pid
service.process.apache.running    1
service.process.apache.group    server

service.file.passwd.status    0
service.file.passwd.pollcycle    5
service.file.passwd.monitored    1
service.file.passwd.path    /etc/passwd
service.file.passwd.checksumtest    if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert
service.file.passwd.uidtest    if failed 0 then alert else if recovered
then alert
service.file.passwd.size    46
service.file.passwd.size.status    1
service.file.passwd.perm    644
service.file.passwd.perm.status    1
service.file.passwd.uid    500
service.file.passwd.uid.status    0
service.file.passwd.gid        0
service.file.passwd.checksum    9f55b3a33ef21ab86b7a92720e8647966217c9f3
service.file.passwd.checksum.type    SHA1
service.file.passwd.checksum.status    0

--- For XML output format:

<service type="process" name="apache" status="1" monitored="1">
    <pid>977</pid>
    <pidfile>/var/run/httpd.pid</pidfile>
    <group>server</group>
</service>
<service type="file" name="passwd" status="0" monitored="1">
    <path>/etc/passwd</path>
    <pollcycle>5</pollcycle>
    <checksumtest>if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert</checksumtest>
    <uidtest>if failed 0 then alert else if recovered then alert</uidtest>
    <size status="1">46</size>
    <perm status="1">644</perm>
    <uid status="0">500</uid>
    <gid status="1">0</gid>
    <checksum type="SHA1"
status="0">9f55b3a33ef21ab86b7a92720e8647966217c9f3</checksum>
</service>

---

Ooof, what do you all think about this? See the idea? The information
base is always the same, presentation differs. Keywords are shared
between structured text and XML and keyword nicenames are shared between
humanistic text and HTML. Both keywords and their long forms can be
selected from arrays.

The test description strings (e.g. `if failed 0 then alert else if
recovered then alert') are currently constructed from the data
structures in RAM when printing the HTML. I just dumped them to all
formats that I propose. There is a possibility to copy the test
structure for example to the XML so the parsing program could
reconstruct the meaning of the test. But I don't believe it is necessary
to be crazy this much. ;-]

I introduce the attributes like "type", "name", "status" to the
<service> element, because this way they are better selectable with
XPath expressions. All processes with /monit/address@hidden"process"],
all failures with /monit/address@hidden"0"], count failures easily, etc...

I hope I'm not the only one who sees the advantages of the unified
output scheme. If the core developers will tell me, that they like it
and that such change would make it to their monit, I'm willing to code
it. :-) m/monit would also need a change a bit, but it's trivial once it
can handle XML.

Vlada

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]