[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Some feature notes for monit, volume III
From: |
Vlada Macek |
Subject: |
Some feature notes for monit, volume III |
Date: |
Wed, 14 Jul 2004 17:50:38 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124 |
Hi!
There was only a few responses to my previous Volume II and I'm here
with a new batch. Happy again? :-)
So here are my idea:
### Monit monitoring
Is there some way to detect that monit is successfully running? I would
like to ensure (from the cron) that the monit process does not merely
exist, but actually does the checks it should.
That way I would have a protection of cron-monit pair in both directions
(checking the cron's log timestamp from monit).
One solution that comes to my mind is for monit to touch the file
/var/run/monit.touch every cycle... monit.touch could then be checked by
`find ... -mmin +` on Linux. Another solution would be to write
"--MONITMARK--" log entry every, say, 10 cycles (or configurable time)
to the log output (whenever it's syslog or own logfile).
### Cross-pidfile mistakes
It would be nice from monit to check that the given pidfile of the
process is unique at the time of syntax checking of the controlfile.
This would avoid the configuration mistakes. Just if that's not already
being done. :-)
### LDFLAGS="-static" doc
This trick would maybe fit to the documentation. Monit is the kind of
software, that one might need to build statically and this is now
commonly known.
### Service textual comments
The monit web or e-mail output may sometimes be (temporarily) read by
nontechnical people. Attaching a comment to each service explaining what
is this item or exactly what to do in case of the failure may be handy.
What about something like this:
check file ... "/proc/mdstat"
desc "Disk array check. In case of any failure, please contact any
admin ASAP! Telephone..."
I hope that the closing "-chanracter will be correctly found by
tokenizer in UTF-8 text. There are possible problems with text encodings
when introducing arbitrary strings to control files. The encoding should
be defined somewhere and passed to output HTML, XML and e-mail headers...
### Statistics
There are special utilities that gather statistical data from various
parts of UNIX systems. MRTG, snmpd are examples. But when I already run
monit on my server and it is periodically getting most of this data I'm
interested to become graphs possibly, isn't it unnecessary to run
another daemon just to do a duplicate get-this-valueset job?
I imagine something like this directive in the CHECK block:
EVERY m CYCLES # usual monit statement
REGISTER VMUSAGE|LOADAVG(1 min)|TIMESTAMP ... EVERY n CYCLES
When the m == n, the check and the registering may both be woken up at
the same time and the compared value may be directly used for
registering (in case such property is being checked actually). When m !=
n, the value would need to be retrieved and registered at another moment.
The registered values are best stored on the disk (the data may grow
large). Special tagged log entry may be written to usual monit log
output, or it may be written to special output file as lines similar to
this:
secs-since-1970 service-name property-name value
Monit should be able to return the selected portion of this file when
asked (there already is something similar -- _viewlog action) in TEXT or
XML form. Data then could be gathered by any software, even MRTG.
First, we do have to not run any special data agent such as snmpd, just
using existing gathering facility in monit. Second, in case of network
failure the usual remote gathering methods AFAIK introduce a time-gap in
statistics, they does not include any caching. This way the data are got
and written by monit at exact periods undisturbed and returned in a
bunch once when wished (no unnecessary network transfers each period).
Someday in future there may be a CHECK INTERFACE statement in monit.
After then, the registering would be even more interesting. :-)
### Refactoring of the HTTP output, unify
There is a separate set of output routines for HTML and TEXT/XML in
http/cervlet.c. I believe the simple XML output was originally added for
communication with the m/monit to pass the true/false information about
each service. The goal was not to provide an exhaustive information.
Monit currently serves the maximum status data in HTML format, which is
poor for consecutive data processing. I think it would be much better to
unify functions that read RAM structures to one skeleton and call the
tag printing routines in accordance to the format requested. This way
all formats would provide the same set of information and there is only
one code to maintain. Additionally, much of the unified printing code
may be shared by multiple services.
--- HTML would stay almost in the current form (one service per page):
<tr>
<td>Name</td>
<td>apache</td>
</tr><tr>
<td>Status</td>
<td><font color="#00ff00">OK</font></td>
</tr><tr>
<td>Process status</td>
<td><font color="#00ff00">Running</font></td>
</tr><tr>
<td>Process id</td>
<td>977</td>
</tr><tr>
<td>Pid file</td>
<td>/var/run/httpd.pid</td>
</tr><tr>
<td>Group</td>
<td><font color="#0000ff">server</font></td>
</tr>
On another page:
<tr>
<td>Name</td>
<td>passwd</td>
</tr><tr>
<td>Status</td>
<td><font color="#ff0000">Failure!</font></td>
</tr><tr>
<td>Path</td>
<td>/etc/passwd</td>
</tr><tr>
<td>Monitoring mode</td>
<td>active</td>
</tr><tr>
<td>Check service</td>
<td>every 5 cycle</td>
</tr><tr>
<td>Associated checksum</td>
<td>if failed db691baffdc13c2045b43e9234e6357fa8c37598(SHA1)
then alert else if recovered then alert</td>
</tr><tr>
<td>Associated UID</td>
<td>if failed 0 then alert else if recovered then alert</td>
</tr><tr>
<td>Size</td>
<td>46 B</td>
</tr><tr>
<td>Permission</td>
<td><font>644</font></td>
</tr><tr>
<td>UID</td>
<td><font color="#ff0000">500</font></td>
</tr><tr>
<td>GID</td>
<td><font>0</font></td>
</tr><tr>
<td>Checksum</td>
<td><font
color="#ff0000">9f55b3a33ef21ab86b7a92720e8647966217c9f3</font></td>
</tr>
--- For humanistic text output format:
Service: process apache
Status: OK
Process id: 977
Pid file: /var/run/httpd.pid
Process status: Running
Group: server
Service: file passwd
!!! Status: Failure!
Path: /etc/passwd
Monitoring mode: active
Check service: every 5 cycle
Associated checksum: if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert
Associated UID: if failed 0 then alert else if recovered then alert
Size: 46 B
Permission: 644
!!! UID: 500
GID: 0
!!! Checksum: 9f55b3a33ef21ab86b7a92720e8647966217c9f3
--- For structured text (tabbed) output format:
service.process.apache.status 1
service.process.apache.pid 977
service.process.apache.pidfile /var/run/httpd.pid
service.process.apache.running 1
service.process.apache.group server
service.file.passwd.status 0
service.file.passwd.pollcycle 5
service.file.passwd.monitored 1
service.file.passwd.path /etc/passwd
service.file.passwd.checksumtest if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert
service.file.passwd.uidtest if failed 0 then alert else if recovered
then alert
service.file.passwd.size 46
service.file.passwd.size.status 1
service.file.passwd.perm 644
service.file.passwd.perm.status 1
service.file.passwd.uid 500
service.file.passwd.uid.status 0
service.file.passwd.gid 0
service.file.passwd.checksum 9f55b3a33ef21ab86b7a92720e8647966217c9f3
service.file.passwd.checksum.type SHA1
service.file.passwd.checksum.status 0
--- For XML output format:
<service type="process" name="apache" status="1" monitored="1">
<pid>977</pid>
<pidfile>/var/run/httpd.pid</pidfile>
<group>server</group>
</service>
<service type="file" name="passwd" status="0" monitored="1">
<path>/etc/passwd</path>
<pollcycle>5</pollcycle>
<checksumtest>if failed
db691baffdc13c2045b43e9234e6357fa8c37598(SHA1) then alert else if
recovered then alert</checksumtest>
<uidtest>if failed 0 then alert else if recovered then alert</uidtest>
<size status="1">46</size>
<perm status="1">644</perm>
<uid status="0">500</uid>
<gid status="1">0</gid>
<checksum type="SHA1"
status="0">9f55b3a33ef21ab86b7a92720e8647966217c9f3</checksum>
</service>
---
Ooof, what do you all think about this? See the idea? The information
base is always the same, presentation differs. Keywords are shared
between structured text and XML and keyword nicenames are shared between
humanistic text and HTML. Both keywords and their long forms can be
selected from arrays.
The test description strings (e.g. `if failed 0 then alert else if
recovered then alert') are currently constructed from the data
structures in RAM when printing the HTML. I just dumped them to all
formats that I propose. There is a possibility to copy the test
structure for example to the XML so the parsing program could
reconstruct the meaning of the test. But I don't believe it is necessary
to be crazy this much. ;-]
I introduce the attributes like "type", "name", "status" to the
<service> element, because this way they are better selectable with
XPath expressions. All processes with /monit/address@hidden"process"],
all failures with /monit/address@hidden"0"], count failures easily, etc...
I hope I'm not the only one who sees the advantages of the unified
output scheme. If the core developers will tell me, that they like it
and that such change would make it to their monit, I'm willing to code
it. :-) m/monit would also need a change a bit, but it's trivial once it
can handle XML.
Vlada
signature.asc
Description: OpenPGP digital signature