[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: automatic resume of monitoring, is it possible?
From: |
Martin Pala |
Subject: |
Re: automatic resume of monitoring, is it possible? |
Date: |
Mon, 28 Feb 2011 13:47:16 +0100 |
Having multiple checks for the same resource (like mentioned "file1" and
"file1recover") isn't problem.
Martin
On Feb 28, 2011, at 2:39 AM, John (yt) Hogenmiller wrote:
> Thanks for the clarification.
>
> So in this case, instead of telling the dependent (file2) about the
> parent (file1), I would have the parent (file1) start and stop
> monitoring on its dependents (file2).
>
> I could see that working. In my case though, I would probably have to
> define groups and start and stop monitoring on a group. I have one
> device that connects almost everything together. I actually have a
> sort of cascade going on. One access point has three subscriber units
> connected to it, and then each subscriber unit has its own access
> point attached. One of the subscriber units has a router and a server
> behind it.
>
> Here is my network layout if the formatting holds up
>
> / -> fv3su -> fv3ap
> monit server -> fvinside -> fv1ap -> fv4su -> fv4ap
> \-> fv2su -> fv2ap
> \-> fvoffice ->
> fvofficeserv
>
> Again, these are all physically discrete devices with no way to
> automatically restart them. The biggest one is if FV1 or FVINSIDE
> goes down, we'll get 8-9 other devices also showing down. If FV2SU
> goes down, only 3 other devices show down.
>
> I could perhaps create some monitoring like so (going back to the file
> example):
>
>
> check file1 with path /tmp/file1
> if failed permission 555 then exec "/usr/sbin/monit start file1recover"
> if failed permission 555 then stop
>
> check file1recover with path /tmp/file1
> if succeeded permission 555 for 2 cycles then exec "/usr/bin/monit
> start file1"
> if succeeded permission 555 for 2 cycles then exec "/usr/bin/monit
> -g subfiles start"
> if succeeded permission 555 for 2 cycles then exec "/usr/bin/monit
> stop file1recover"
>
> check file2 with path /tmp/file2
> if failed permission 555 then alert
> group "subfiles"
> depends "file1"
>
> check file3 with path /tmp/file3
> if failed permission 555 then alert
> group "subfiles"
> depends "file1"
>
> I haven't had a chance to test this yet, does monit have any issues
> with multiple checks being the same? Any other suggestions would be
> appreciated. I've been working with nagios and mrtg on this network
> already. Nagios even has a really nice network map built in.
> However, I like the straightforward configuration presented with
> monit, and I even like the list of status up/downs monit provides on
> the web interface. With nagios, it might show all services as
> up/green on the network map, but it's not until you click on a
> specific service that you see that 1 service (like ssh) is timing out.
> Also, I'm running the monitoring on a system with 128MB of memory,
> so lean and fast is good.
>
>
> -John
>
>
>
> On Sun, Feb 27, 2011 at 12:38 PM, Martin Pala <address@hidden> wrote:
>> Hello,
>>
>> The action "monitor" really doesn't exist - i have fixed the documentation.
>> The "monitor" action wouldn't make sense, as the service is monitored
>> already.
>>
>> The "stop" action stops the service and disables monitoring => monit doesn't
>> check the service anymore until the monitoring is enabled again (using
>> "monit monitor ... or "monit start ...").
>>
>> The setup which should work in your case:
>>
>> --8<--
>> check file file1 with path "/tmp/file1"
>> if failed permission 555 then exec "/usr/bin/monit stop file2" else if
>> succeeded then exec "/usr/bin/monit start file2"
>>
>> check file file2 path "/tmp/file2"
>> if failed permission 555 then alert
>> --8<--
>>
>> => if the permissions fail, the "file2" service is stopped, but the
>> monitoring of "file1" service continues. If "file1" recovers, the "file2" is
>> started again.
>>
>> Regards,
>> Martin
>>
>>
>> On Feb 27, 2011, at 1:58 AM, John (yt) Hogenmiller wrote:
>>
>>> Hello list,
>>>
>>> I've been playing with monit in hopes of using it to monitor a
>>> wireless installation. At first, it looked like
>>> it was doing ok, but then I noticed the "depends on" wasn't working as
>>> I had hoped. If deviceA is unreachable, deviceB
>>> and deviceC will also be unreachable, so I setup my depends on
>>> accordingly, but I still got alerts for all three services.
>>>
>>> After looking further into the documentation, it seems "depends on"
>>> requires monitoring to be stopped on a service for the depends
>>> on service to stop monitoring. That's fine, but I'm looking for a way
>>> to restart monitoring automatically. In our scenario, if a device
>>> goes
>>> unpingable, someone would have to physically power cycle it to bring
>>> it back online (or potentially replace the device).
>>>
>>> The documentation wasn't too clear (at least to me) on a way to
>>> configure monit this way, so setup an instance that
>>> polled every 10 seconds and monitored two files. All the steps I took
>>> are below. If anyone can look at my testing and offer advice,
>>> I'd appreciate it. Perhaps I'm reading the documentation wrong, or
>>> perhaps there's just now way to do what I'm trying (perhaps
>>> M/Monit has such capabilities).
>>>
>>> I originally tested under 5.0.3 (latest with Ubuntu/apt-get), but then
>>> upgraded to 5.2.4 hoping for different results.
>>>
>>> First, my checks:
>>>
>>>
>>> check file file1 with path "/tmp/file1"
>>> if failed permission 555 then unmonitor
>>> # manul implies that I can do "else if succeeded then
>>> monitor", but
>>> this fails syntax
>>> else if succeeded then alert
>>>
>>> check file file2 path "/tmp/file2"
>>> if failed permission 555 then alert
>>> depends on file1
>>>
>>>
>>> changing /tmp/file1 to 500 does indeed stop monitoring on file1 and file2
>>>
>>> [EST Feb 26 13:30:47] debug : monitor service 'file1' on user request
>>> [EST Feb 26 13:30:47] info : Awakened by User defined signal 1
>>> [EST Feb 26 13:30:47] info : monit daemon at 31932 awakened
>>> [EST Feb 26 13:30:47] info : 'file1' monitor action done
>>>
>>>
>>> On a lark, I updated my config like so:
>>>
>>> check file file1 with path "/tmp/file1"
>>> if failed permission 555 then stop
>>> else if succeeded then start
>>>
>>> check file file2 path "/tmp/file2"
>>> if failed permission 555 then alert
>>> depends on file1
>>>
>>>
>>> Upon changing file1 to 500, both services went into not monitored
>>>
>>> Upong changing file1 back to 555, services did not resume. If
>>> manually tell it to start monitoring file1, file2 does not
>>> automatically begin monitoring again.
>>>
>>>
>>>
>>> Other notes:
>>> I had a whole bug report showing that you can't restart monitoring a
>>> service from the command line, but I realised that was a bug
>>> in 5.0.3, which is the latest Ubuntu provides, but this was fixed once
>>> I downloaded 5.2.4. I only mention this for anyone else using monit
>>> from the Ubuntu repositories.
>>>
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>
>>
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>
>
>
>
> --
> John Hogenmiller - address@hidden
> Used for mailing lists - sporadic response
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general