[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [monit] Question about multi-host testing
From: |
Pablo Iranzo Gómez |
Subject: |
Re: [monit] Question about multi-host testing |
Date: |
Tue, 30 Oct 2007 21:47:31 +0100 (CET) |
Will try tomorrow, and let you know if this works.
Again, thank you very much for your analisys and explanation.
Pablo
--
Pablo Iranzo GĂłmez
(http://Alufis35.uv.es/~iranzo/)
(PGPKey Available on http://www.uv.es/~iranzop/PGPKey.pgp)
--
Postulado de Boling sobre la Ley de Murphy:
Si se encuentra bien, no se preocupe. Se le pasarĂĄ
On Tue, 30 Oct 2007, Martin Pala wrote:
> The problem is, that the dependency was designed primarily for critical
> actions (start/stop/restart/monitor/unmonitor), where correct order is
> needed.
>
> The alert-only action doesn't trigger the dependency (action chain)
> since it could be just informative.
>
> For example if you are monitoring the icmp, you can have few error
> levels, such as:
>
> --8<--
> check host myrouter with address ...
> if failed icmp type echo for 3 times within 5 cycles then alert
> if failed icmp type echo for 5 cycles then exec
> "/script/to/power-cycle/router"
> --8<--
>
> In such case monit sends alert when the network has problems, but is not
> completely dead (part of packets lost) and can recover itself yet. In
> such case this shouldn't disable the monitoring of remote hosts. When
> the error ratio is 100% for 5 cycles (the second icmp line), then it can
> exec for example script to power-cycle the router (networked power
> switch ... point-to-point or on the same ethernet switch to be reachable
> if router is not available).
>
> So, the final solution could be to extend the dependency and make the
> service dependency hard by option even on alert message (to stop
> monitoring the other services).
>
> Workaround could be to define dummy start/stop methods for monitored
> remote hosts and use restart action instead of alert (it sends alert as
> well). Something like:
>
> --8<--
> check host myswitch ...
> start program = "/bin/true"
> stop program = "/bin/true"
> if failed icmp type echo for 5 cycles then restart
>
> check host myrouter ...
> start program = "/bin/true"
> stop program = "/bin/true"
> if failed icmp type echo for 5 cycles then restart
> depends on myswitch
> --8<--
>
> ... not tested, but can work (although the restart action doesn't look
> logical, it can trigger the dependency in this case as well).
>
>
> Martin
>
>
> Pablo Iranzo Gómez wrote:
> > List, here is the output from monit running in interactive mode with
> > -vv:
> >
> > From log start:
> > -----------------------------------------------------------------------
> > Remote Host Name = ro5000-siNmG20876YFyCu20879
> > Monitoring mode = active
> > ICMP = if failed Echo Request count 1 with timeout 10
> > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > within 1 cycle(s) then alert
> > Alert mail to = address@hidden
> > Alert on = All events
> > Alert reminder = 1 cycles
> >
> > Remote Host Name = pos10.5000-siNmG20876YFyCu20879
> > Monitoring mode = active
> > Depends on Service = ro5000-siNmG20876YFyCu20879
> > ICMP = if failed Echo Request count 1 with timeout 10
> > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > within 1 cycle(s) then alert
> > Alert mail to = address@hidden
> > Alert on = All events
> > Alert reminder = 1 cycles
> >
> >
> > From Log checking:
> > -----------------------------------------------------------------------
> > 'ro5000-siNmG20876YFyCu20879' icmp ping failed
> > 'ro5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > ICMP failed notification is sent to address@hidden
> > 'ro5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > connection tests
> > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed
> > 'pos10.5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > ICMP failed notification is sent to address@hidden
> > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > connection tests
> >
> >
> > Config files:
> > -----------------------------------------------------------------------
> > check host ro5000-siNmG20876YFyCu20879 with address 10.39.16.1
> > if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > alert address@hidden with reminder on 1 cycle
> >
> > check host pos10.5000-siNmG20876YFyCu20879 with address 10.39.16.10
> > if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > alert address@hidden with reminder on 1 cycle
> > depends on ro5000-siNmG20876YFyCu20879
> >
> >
> >
> > Any hint?
> >
> > Thanks in advance,
> > Pablo
> >
> >
> > El lun, 29-10-2007 a las 21:51 +0100, Pablo Iranzo Gómez escribió:
> >> Martin,
> >>
> >> On Mon, 29 Oct 2007, Martin Pala wrote:
> >>> Can you run monit in verbose mode (-v option) and send the log? You'll
> >>> see in it what happened in more detail.
> >> Sure, will do it tomorrow early in the morning :)
> >>
> >>>> If I just put "if failed icmp then alert" monit complains about
> >>>> configuration (I'm using monit-4.9-1), so either I'm doing something
> >>>> wrong or it's a problem with this verson.
> >>> I'm sorry - this was typo (i wrote the example just from memory so, the
> >>> "type echo" was missing).
> >> Don't worry, I was just trying just in case I did something wrong
> >> :)
> >>
> >> Thanks again
> >> Pablo
> >>
> >>
> >> --
> >> To unsubscribe:
> >> http://lists.nongnu.org/mailman/listinfo/monit-general
> >>
> >> ------------------------------------------------------------------------
> >>
> >> --
> >> To unsubscribe:
> >> http://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>
- [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/26
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/26
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/29
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/29
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/29
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/30
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/30
- Re: [monit] Question about multi-host testing,
Pablo Iranzo Gómez <=
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/31