monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Question about multi-host testing


From: Martin Pala
Subject: Re: [monit] Question about multi-host testing
Date: Tue, 30 Oct 2007 21:03:51 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 Iceape/1.1.4 (Debian-1.1.4-1)

The problem is, that the dependency was designed primarily for critical actions (start/stop/restart/monitor/unmonitor), where correct order is needed.

The alert-only action doesn't trigger the dependency (action chain) since it could be just informative.

For example if you are monitoring the icmp, you can have few error levels, such as:

--8<--
check host myrouter with address ...
  if failed icmp type echo for 3 times within 5 cycles then alert
if failed icmp type echo for 5 cycles then exec "/script/to/power-cycle/router"
--8<--

In such case monit sends alert when the network has problems, but is not completely dead (part of packets lost) and can recover itself yet. In such case this shouldn't disable the monitoring of remote hosts. When the error ratio is 100% for 5 cycles (the second icmp line), then it can exec for example script to power-cycle the router (networked power switch ... point-to-point or on the same ethernet switch to be reachable if router is not available).

So, the final solution could be to extend the dependency and make the service dependency hard by option even on alert message (to stop monitoring the other services).

Workaround could be to define dummy start/stop methods for monitored remote hosts and use restart action instead of alert (it sends alert as well). Something like:

--8<--
check host myswitch ...
  start program = "/bin/true"
  stop program = "/bin/true"
  if failed icmp type echo for 5 cycles then restart

check host myrouter ...
  start program = "/bin/true"
  stop program = "/bin/true"
  if failed icmp type echo for 5 cycles then restart
  depends on myswitch
--8<--

... not tested, but can work (although the restart action doesn't look logical, it can trigger the dependency in this case as well).


Martin


Pablo Iranzo Gómez wrote:
        List, here is the output from monit running in interactive mode with
-vv:

From log start:
-----------------------------------------------------------------------
Remote Host Name      = ro5000-siNmG20876YFyCu20879
 Monitoring mode      = active
 ICMP                 = if failed Echo Request count 1 with timeout 10
seconds 1 times within 1 cycle(s) then alert else if passed 1 times
within 1 cycle(s) then alert
 Alert mail to        = address@hidden
   Alert on           = All events
   Alert reminder     = 1 cycles

Remote Host Name      = pos10.5000-siNmG20876YFyCu20879
 Monitoring mode      = active
 Depends on Service   = ro5000-siNmG20876YFyCu20879
 ICMP                 = if failed Echo Request count 1 with timeout 10
seconds 1 times within 1 cycle(s) then alert else if passed 1 times
within 1 cycle(s) then alert
 Alert mail to        = address@hidden
   Alert on           = All events
   Alert reminder     = 1 cycles


From Log checking:
-----------------------------------------------------------------------
'ro5000-siNmG20876YFyCu20879' icmp ping failed
'ro5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
ICMP failed notification is sent to address@hidden
'ro5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
connection tests
'pos10.5000-siNmG20876YFyCu20879' icmp ping failed
'pos10.5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
ICMP failed notification is sent to address@hidden
'pos10.5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
connection tests


Config files:
-----------------------------------------------------------------------
check host ro5000-siNmG20876YFyCu20879 with address 10.39.16.1
        if failed ICMP type ECHO count 1 timeout 10 seconds then alert
        alert address@hidden with reminder on 1 cycle

check host pos10.5000-siNmG20876YFyCu20879 with address 10.39.16.10
        if failed ICMP type ECHO count 1 timeout 10 seconds then alert
        alert address@hidden with reminder on 1 cycle
        depends on ro5000-siNmG20876YFyCu20879



        Any hint?

        Thanks in advance,
        Pablo


El lun, 29-10-2007 a las 21:51 +0100, Pablo Iranzo Gómez escribió:
        Martin,

On Mon, 29 Oct 2007, Martin Pala wrote:
Can you run monit in verbose mode (-v option) and send the log? You'll
see in it what happened in more detail.
        Sure, will do it tomorrow early in the morning :)

        If I just put "if failed icmp then alert" monit complains about
configuration (I'm using monit-4.9-1), so either I'm doing something
wrong or it's a problem with this verson.
I'm sorry - this was typo (i wrote the example just from memory so, the
"type echo" was missing).
        Don't worry, I was just trying just in case I did something wrong
:)

        Thanks again
        Pablo


--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

------------------------------------------------------------------------

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]