[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [monit] Question about multi-host testing
From: |
Pablo Iranzo Gómez |
Subject: |
Re: [monit] Question about multi-host testing |
Date: |
Wed, 31 Oct 2007 10:23:55 +0100 |
Martin, that workarround seems to work fine, the alert is ignoring
dependencies, while restart is not.
Thanks again
Pablo
El mar, 30-10-2007 a las 21:47 +0100, Pablo Iranzo Gómez escribió:
> Will try tomorrow, and let you know if this works.
> Again, thank you very much for your analisys and explanation.
>
> Pablo
>
>
>
> --
> Pablo Iranzo GĂłmez
> (http://Alufis35.uv.es/~iranzo/)
> (PGPKey Available on http://www.uv.es/~iranzop/PGPKey.pgp)
> --
> Postulado de Boling sobre la Ley de Murphy:
>
> Si se encuentra bien, no se preocupe. Se le pasarĂĄ
>
> On Tue, 30 Oct 2007, Martin Pala wrote:
>
> > The problem is, that the dependency was designed primarily for critical
> > actions (start/stop/restart/monitor/unmonitor), where correct order is
> > needed.
> >
> > The alert-only action doesn't trigger the dependency (action chain)
> > since it could be just informative.
> >
> > For example if you are monitoring the icmp, you can have few error
> > levels, such as:
> >
> > --8<--
> > check host myrouter with address ...
> > if failed icmp type echo for 3 times within 5 cycles then alert
> > if failed icmp type echo for 5 cycles then exec
> > "/script/to/power-cycle/router"
> > --8<--
> >
> > In such case monit sends alert when the network has problems, but is not
> > completely dead (part of packets lost) and can recover itself yet. In
> > such case this shouldn't disable the monitoring of remote hosts. When
> > the error ratio is 100% for 5 cycles (the second icmp line), then it can
> > exec for example script to power-cycle the router (networked power
> > switch ... point-to-point or on the same ethernet switch to be reachable
> > if router is not available).
> >
> > So, the final solution could be to extend the dependency and make the
> > service dependency hard by option even on alert message (to stop
> > monitoring the other services).
> >
> > Workaround could be to define dummy start/stop methods for monitored
> > remote hosts and use restart action instead of alert (it sends alert as
> > well). Something like:
> >
> > --8<--
> > check host myswitch ...
> > start program = "/bin/true"
> > stop program = "/bin/true"
> > if failed icmp type echo for 5 cycles then restart
> >
> > check host myrouter ...
> > start program = "/bin/true"
> > stop program = "/bin/true"
> > if failed icmp type echo for 5 cycles then restart
> > depends on myswitch
> > --8<--
> >
> > ... not tested, but can work (although the restart action doesn't look
> > logical, it can trigger the dependency in this case as well).
> >
> >
> > Martin
> >
> >
> > Pablo Iranzo Gómez wrote:
> > > List, here is the output from monit running in interactive mode with
> > > -vv:
> > >
> > > From log start:
> > > -----------------------------------------------------------------------
> > > Remote Host Name = ro5000-siNmG20876YFyCu20879
> > > Monitoring mode = active
> > > ICMP = if failed Echo Request count 1 with timeout 10
> > > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > > within 1 cycle(s) then alert
> > > Alert mail to = address@hidden
> > > Alert on = All events
> > > Alert reminder = 1 cycles
> > >
> > > Remote Host Name = pos10.5000-siNmG20876YFyCu20879
> > > Monitoring mode = active
> > > Depends on Service = ro5000-siNmG20876YFyCu20879
> > > ICMP = if failed Echo Request count 1 with timeout 10
> > > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > > within 1 cycle(s) then alert
> > > Alert mail to = address@hidden
> > > Alert on = All events
> > > Alert reminder = 1 cycles
> > >
> > >
> > > From Log checking:
> > > -----------------------------------------------------------------------
> > > 'ro5000-siNmG20876YFyCu20879' icmp ping failed
> > > 'ro5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > > ICMP failed notification is sent to address@hidden
> > > 'ro5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > > connection tests
> > > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed
> > > 'pos10.5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > > ICMP failed notification is sent to address@hidden
> > > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > > connection tests
> > >
> > >
> > > Config files:
> > > -----------------------------------------------------------------------
> > > check host ro5000-siNmG20876YFyCu20879 with address 10.39.16.1
> > > if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > > alert address@hidden with reminder on 1 cycle
> > >
> > > check host pos10.5000-siNmG20876YFyCu20879 with address 10.39.16.10
> > > if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > > alert address@hidden with reminder on 1 cycle
> > > depends on ro5000-siNmG20876YFyCu20879
> > >
> > >
> > >
> > > Any hint?
> > >
> > > Thanks in advance,
> > > Pablo
> > >
> > >
> > > El lun, 29-10-2007 a las 21:51 +0100, Pablo Iranzo Gómez escribió:
> > >> Martin,
> > >>
> > >> On Mon, 29 Oct 2007, Martin Pala wrote:
> > >>> Can you run monit in verbose mode (-v option) and send the log? You'll
> > >>> see in it what happened in more detail.
> > >> Sure, will do it tomorrow early in the morning :)
> > >>
> > >>>> If I just put "if failed icmp then alert" monit complains about
> > >>>> configuration (I'm using monit-4.9-1), so either I'm doing something
> > >>>> wrong or it's a problem with this verson.
> > >>> I'm sorry - this was typo (i wrote the example just from memory so, the
> > >>> "type echo" was missing).
> > >> Don't worry, I was just trying just in case I did something wrong
> > >> :)
> > >>
> > >> Thanks again
> > >> Pablo
> > >>
> > >>
> > >> --
> > >> To unsubscribe:
> > >> http://lists.nongnu.org/mailman/listinfo/monit-general
> > >>
> > >> ------------------------------------------------------------------------
> > >>
> > >> --
> > >> To unsubscribe:
> > >> http://lists.nongnu.org/mailman/listinfo/monit-general
> >
> >
> > --
> > To unsubscribe:
> > http://lists.nongnu.org/mailman/listinfo/monit-general
> >
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
--
Pablo Iranzo Gómez (address@hidden)
RHCE/Global Profesional Services Consultant Spain
Phone: +34 645 01 01 49 (CET/CEST)
GnuPG KeyID: 0xFAD3CF0D
--
Inscrita en el Reg. Mercantil de Madrid – C.I.F. B-82 65 79 41
Directores: Michael Cunningham, Charlie Peters y David Owens
Dirección Registrada: Red Hat S.L., C/ Velazquez 63, Madrid 28001, España
Dirección contacto: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, Planta 3ºD, 28016
Madrid, Spain
signature.asc
Description: Esta parte del mensaje está firmada digitalmente
- [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/26
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/26
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/29
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/29
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/29
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/30
- Re: [monit] Question about multi-host testing, Martin Pala, 2007/10/30
- Re: [monit] Question about multi-host testing, Pablo Iranzo Gómez, 2007/10/30
- Re: [monit] Question about multi-host testing,
Pablo Iranzo Gómez <=