Re: automatic resume of monitoring, is it possible?

monit-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: automatic resume of monitoring, is it possible?

From:	Martin Pala
Subject:	Re: automatic resume of monitoring, is it possible?
Date:	Mon, 28 Feb 2011 13:47:16 +0100

Having multiple checks for the same resource (like mentioned "file1" and 
"file1recover") isn't problem.

Martin


On Feb 28, 2011, at 2:39 AM, John (yt) Hogenmiller wrote:

> Thanks for the clarification.
> 
> So in this case, instead of telling the dependent (file2) about the
> parent (file1), I would have the parent (file1) start and stop
> monitoring on its dependents (file2).
> 
> I could see that working.  In my case though, I would probably have to
> define groups and start and stop monitoring on a group.  I have one
> device that connects almost everything together.    I actually have a
> sort of cascade going on.  One access point has three subscriber units
> connected to it, and then each subscriber unit has its own access
> point attached.  One of the subscriber units has a router and a server
> behind it.
> 
> Here is my network layout if the formatting holds up
> 
>                                               / -> fv3su -> fv3ap
> monit server -> fvinside -> fv1ap  -> fv4su  -> fv4ap
>                                               \-> fv2su -> fv2ap
>                                                        \-> fvoffice ->
> fvofficeserv
> 
> Again, these are all physically discrete devices with no way to
> automatically restart them.    The biggest one is if FV1 or FVINSIDE
> goes down, we'll get 8-9 other devices also showing down.  If FV2SU
> goes down, only 3 other devices show down.
> 
> I could perhaps create some monitoring like so (going back to the file 
> example):
> 
> 
> check file1 with path /tmp/file1
>   if failed permission 555 then exec "/usr/sbin/monit start file1recover"
>   if failed permission 555 then stop
> 
> check file1recover with path /tmp/file1
>   if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
> start file1"
>   if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
> -g subfiles start"
>   if succeeded permission 555 for 2 cycles then  exec "/usr/bin/monit
> stop file1recover"
> 
> check file2 with path /tmp/file2
>   if failed permission 555 then alert
>   group "subfiles"
>   depends "file1"
> 
> check file3 with path /tmp/file3
>   if failed permission 555 then alert
>   group "subfiles"
>   depends "file1"
> 
> I  haven't had a chance to test this yet, does monit have any issues
> with multiple checks being the same?  Any other suggestions would be
> appreciated.  I've been working with nagios and mrtg on this network
> already.  Nagios even has a really nice network map built in.
> However, I like the straightforward configuration presented with
> monit, and I even like the list of status up/downs monit provides on
> the web interface.  With nagios, it might show all services as
> up/green on the network map, but it's not until you click on a
> specific service that you see that 1 service (like ssh) is timing out.
>  Also, I'm running the monitoring on a system with 128MB of memory,
> so lean and fast is good.
> 
> 
> -John
> 
> 
> 
> On Sun, Feb 27, 2011 at 12:38 PM, Martin Pala <address@hidden> wrote:
>> Hello,
>> 
>> The action "monitor" really doesn't exist - i have fixed the documentation. 
>> The "monitor" action wouldn't make sense, as the service is monitored 
>> already.
>> 
>> The "stop" action stops the service and disables monitoring => monit doesn't 
>> check the service anymore until the monitoring is enabled again (using 
>> "monit monitor ... or "monit start ...").
>> 
>> The setup which should work in your case:
>> 
>> --8<--
>> check file file1 with path "/tmp/file1"
>>    if failed permission 555 then exec "/usr/bin/monit stop file2" else if 
>> succeeded then exec "/usr/bin/monit start file2"
>> 
>> check file file2 path "/tmp/file2"
>>    if failed permission 555 then alert
>> --8<--
>> 
>> => if the permissions fail, the "file2" service is stopped, but the 
>> monitoring of "file1" service continues. If "file1" recovers, the "file2" is 
>> started again.
>> 
>> Regards,
>> Martin
>> 
>> 
>> On Feb 27, 2011, at 1:58 AM, John (yt) Hogenmiller wrote:
>> 
>>> Hello list,
>>> 
>>> I've been playing with monit in hopes of using it to monitor a
>>> wireless installation.  At first, it looked like
>>> it was doing ok, but then I noticed the "depends on" wasn't working as
>>> I had hoped.  If deviceA is unreachable, deviceB
>>> and deviceC will also be unreachable, so I setup my depends on
>>> accordingly, but I still got alerts for all three services.
>>> 
>>> After looking further into the documentation, it seems "depends on"
>>> requires monitoring to be stopped on a service for the depends
>>> on service to stop monitoring.  That's fine, but I'm looking for a way
>>> to restart monitoring automatically.    In our scenario, if a device
>>> goes
>>> unpingable, someone would have to physically power cycle it to bring
>>> it back online (or potentially replace the device).
>>> 
>>> The documentation wasn't too clear (at least to me) on a way to
>>> configure monit this way, so setup an instance that
>>> polled every 10 seconds and monitored two files.  All the steps I took
>>> are below.  If anyone can look at my testing and offer advice,
>>> I'd appreciate it.  Perhaps I'm reading the documentation wrong, or
>>> perhaps there's just now way to do what I'm trying (perhaps
>>> M/Monit has such capabilities).
>>> 
>>> I originally tested under 5.0.3 (latest with Ubuntu/apt-get), but then
>>> upgraded to 5.2.4 hoping for different results.
>>> 
>>> First, my checks:
>>> 
>>> 
>>>       check file file1 with path "/tmp/file1"
>>>              if failed permission 555 then unmonitor
>>>               # manul implies that I can do "else if succeeded then 
>>> monitor", but
>>> this fails syntax
>>>                else if succeeded then alert
>>> 
>>>       check file file2 path "/tmp/file2"
>>>          if failed permission 555 then alert
>>>          depends on file1
>>> 
>>> 
>>> changing /tmp/file1 to 500 does indeed stop monitoring on file1 and file2
>>> 
>>> [EST Feb 26 13:30:47] debug    : monitor service 'file1' on user request
>>> [EST Feb 26 13:30:47] info     : Awakened by User defined signal 1
>>> [EST Feb 26 13:30:47] info     : monit daemon at 31932 awakened
>>> [EST Feb 26 13:30:47] info     : 'file1' monitor action done
>>> 
>>> 
>>> On a lark, I updated my config like so:
>>> 
>>>       check file file1 with path "/tmp/file1"
>>>               if failed permission 555 then stop
>>>               else if succeeded then start
>>> 
>>>       check file file2 path "/tmp/file2"
>>>               if failed permission 555 then alert
>>>               depends on file1
>>> 
>>> 
>>> Upon changing file1 to 500, both services went into not monitored
>>> 
>>> Upong changing file1 back to 555, services did not resume.  If
>>> manually tell it to start monitoring file1, file2 does not
>>> automatically begin monitoring again.
>>> 
>>> 
>>> 
>>> Other notes:
>>> I had a whole bug report showing that you can't restart monitoring a
>>> service from the command line, but I realised that was a bug
>>> in 5.0.3, which is the latest Ubuntu provides, but this was fixed once
>>> I downloaded 5.2.4.   I only mention this for anyone else using monit
>>> from the Ubuntu repositories.
>>> 
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> 
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
>> 
> 
> 
> 
> --
> John Hogenmiller - address@hidden
> Used for mailing lists - sporadic response
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general

[Prev in Thread]

Current Thread

[Next in Thread]

automatic resume of monitoring, is it possible?, John (yt) Hogenmiller, 2011/02/26
- Re: automatic resume of monitoring, is it possible?, Martin Pala, 2011/02/27
  - Re: automatic resume of monitoring, is it possible?, John (yt) Hogenmiller, 2011/02/27
    - Re: automatic resume of monitoring, is it possible?, Martin Pala <=

Prev by Date: Re: automatic resume of monitoring, is it possible?
Next by Date: Troubleshooting - verbose not verbose enough
Previous by thread: Re: automatic resume of monitoring, is it possible?
Next by thread: Troubleshooting - verbose not verbose enough
Index(es):
- Date
- Thread