monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Resetting Checksums


From: Art Age Software
Subject: Re: [monit] Resetting Checksums
Date: Mon, 1 Sep 2008 00:33:24 -0700

On Sat, Aug 30, 2008 at 3:28 PM, Martin Pala <address@hidden> wrote:
>
> On Aug 28, 2008, at 4:43 AM, Art Age Software wrote:
>
>> Hi,
>>
>> A couple monit questions:
>>
>> 1. Let's say I have monit monitoring the checksum of a file. I then
>> make a change to the file which invalidates the checksum. What is the
>> recommended way to tell monit to regenerate the checksum so that it
>> does not alert and unmonitor the file, but causing the least impact.
>> So far, the only thing that has worked for me has been to kill and
>> restart monit itself.
>
> This is simple - just use the "if changed checksum" statement:
>
> --8<--
> check file myfile with path /tmp/aaa
>    if changed checksum then alert
> --8<--
>
> The "if changed checksum" reset the checksum and it check with the new value
> next cycle already.

Just to be sure I understand correctly, you are suggesting I change
from "if failed checksum..." to "if changed checksum..." Correct?

>>
>> 2. When I restart monit, any "mode manual" services that were
>> monitored become unmonitored after restart. Is there any way to
>> restart monit and have it resume monitoring all the services it had
>> been monitoring prior to restarting, including "mode manual" services?
>
>
> The manual mode was planned for cluster - if the node is stopped, the
> services will be started on the other node (by heartbeat for example). Then
> if the original node is booted again, it's not good to start the same
> services on the same node, since they will be running twice (for example
> trying to get the cluster active/passive shared filesystem).
>
> Monit stores the services state however for the unlikely event that monit
> will crash. If monit is started after such accident, it recovers to the
> state before the crash (including monitoring state of manual mode services).
> As workaround - if you are sure that you want to restart monit and keep the
> services state - you can kill monit using SIGKILL (pkill -9 monit). This way
> monit will be terminated uncleanly and will use the state self-healing on
> start - recover the original state.
>
> We could also change the manual mode behavior to be persistent across
> restarts - it may make sense, i.e. if it was monitored before monit stop,
> enable monitoring after monit start again). The cluster framework should
> thus unmonitor the manual mode services if it is going to stop monit (or
> whole node) due to service failover.

Thank you for the work-around.

In my case, the manual mode services are cluster services controlled
by heartbeat. When a failover occurs, I have heartbeat setup to
explicitly unmonitor the services on the failed node and start
monitoring them on the takeover node. I think that if monit crashes,
then yes it makes sense that it recover its state from the stored
state file. I think it also makes sense that monit **not** change the
state of monitored services if it is stopped/started cleanly (change
the manual mode behavior to be persistent across restarts as you
suggested). In the event that the entire machine goes down and monit
cannot differentiate that from a monit crash, then yes monit will try
to recover its state. However, I store the state file on /dev/shm, so
it will not persist across reboots. I believe that covers all the
cases, yes?

Perhaps there should be a new config file option
"ManualModePersistentAcrossRestarts" so that setups relying on current
behavior do not break.

Sam




reply via email to

[Prev in Thread] Current Thread [Next in Thread]