monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Question about multi-host testing


From: Pablo Iranzo Gómez
Subject: Re: [monit] Question about multi-host testing
Date: Wed, 31 Oct 2007 10:23:55 +0100

        Martin, that workarround seems to work fine, the alert is ignoring
dependencies, while restart is not.

        Thanks again
        Pablo

El mar, 30-10-2007 a las 21:47 +0100, Pablo Iranzo Gómez escribió:
>       Will try tomorrow, and let you know if this works.
>       Again, thank you very much for your analisys and explanation.
> 
>       Pablo
> 
> 
> 
> -- 
> Pablo Iranzo GĂłmez
> (http://Alufis35.uv.es/~iranzo/)
> (PGPKey Available on http://www.uv.es/~iranzop/PGPKey.pgp)
>                   --
> Postulado de Boling sobre la Ley de Murphy:
> 
> Si se encuentra bien, no se preocupe. Se le pasarĂĄ
> 
> On Tue, 30 Oct 2007, Martin Pala wrote:
> 
> > The problem is, that the dependency was designed primarily for critical
> > actions (start/stop/restart/monitor/unmonitor), where correct order is
> > needed.
> >
> > The alert-only action doesn't trigger the dependency (action chain)
> > since it could be just informative.
> >
> > For example if you are monitoring the icmp, you can have few error
> > levels, such as:
> >
> > --8<--
> > check host myrouter with address ...
> >    if failed icmp type echo for 3 times within 5 cycles then alert
> >    if failed icmp type echo for 5 cycles then exec
> > "/script/to/power-cycle/router"
> > --8<--
> >
> > In such case monit sends alert when the network has problems, but is not
> > completely dead (part of packets lost) and can recover itself yet. In
> > such case this shouldn't disable the monitoring of remote hosts. When
> > the error ratio is 100% for 5 cycles (the second icmp line), then it can
> > exec for example script to power-cycle the router (networked power
> > switch ... point-to-point or on the same ethernet switch to be reachable
> > if router is not available).
> >
> > So, the final solution could be to extend the dependency and make the
> > service dependency hard by option even on alert message (to stop
> > monitoring the other services).
> >
> > Workaround could be to define dummy start/stop methods for monitored
> > remote hosts and use restart action instead of alert (it sends alert as
> > well). Something like:
> >
> > --8<--
> > check host myswitch ...
> >    start program = "/bin/true"
> >    stop program = "/bin/true"
> >    if failed icmp type echo for 5 cycles then restart
> >
> > check host myrouter ...
> >    start program = "/bin/true"
> >    stop program = "/bin/true"
> >    if failed icmp type echo for 5 cycles then restart
> >    depends on myswitch
> > --8<--
> >
> > ... not tested, but can work (although the restart action doesn't look
> > logical, it can trigger the dependency in this case as well).
> >
> >
> > Martin
> >
> >
> > Pablo Iranzo Gómez wrote:
> > >   List, here is the output from monit running in interactive mode with
> > > -vv:
> > >
> > > From log start:
> > > -----------------------------------------------------------------------
> > > Remote Host Name      = ro5000-siNmG20876YFyCu20879
> > >  Monitoring mode      = active
> > >  ICMP                 = if failed Echo Request count 1 with timeout 10
> > > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > > within 1 cycle(s) then alert
> > >  Alert mail to        = address@hidden
> > >    Alert on           = All events
> > >    Alert reminder     = 1 cycles
> > >
> > > Remote Host Name      = pos10.5000-siNmG20876YFyCu20879
> > >  Monitoring mode      = active
> > >  Depends on Service   = ro5000-siNmG20876YFyCu20879
> > >  ICMP                 = if failed Echo Request count 1 with timeout 10
> > > seconds 1 times within 1 cycle(s) then alert else if passed 1 times
> > > within 1 cycle(s) then alert
> > >  Alert mail to        = address@hidden
> > >    Alert on           = All events
> > >    Alert reminder     = 1 cycles
> > >
> > >
> > > From Log checking:
> > > -----------------------------------------------------------------------
> > > 'ro5000-siNmG20876YFyCu20879' icmp ping failed
> > > 'ro5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > > ICMP failed notification is sent to address@hidden
> > > 'ro5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > > connection tests
> > > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed
> > > 'pos10.5000-siNmG20876YFyCu20879' failed ICMP test [Echo Request]
> > > ICMP failed notification is sent to address@hidden
> > > 'pos10.5000-siNmG20876YFyCu20879' icmp ping failed, skipping any port
> > > connection tests
> > >
> > >
> > > Config files:
> > > -----------------------------------------------------------------------
> > > check host ro5000-siNmG20876YFyCu20879 with address 10.39.16.1
> > >         if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > >         alert address@hidden with reminder on 1 cycle
> > >
> > > check host pos10.5000-siNmG20876YFyCu20879 with address 10.39.16.10
> > >         if failed ICMP type ECHO count 1 timeout 10 seconds then alert
> > >         alert address@hidden with reminder on 1 cycle
> > >         depends on ro5000-siNmG20876YFyCu20879
> > >
> > >
> > >
> > >   Any hint?
> > >
> > >   Thanks in advance,
> > >   Pablo
> > >
> > >
> > > El lun, 29-10-2007 a las 21:51 +0100, Pablo Iranzo Gómez escribió:
> > >>  Martin,
> > >>
> > >> On Mon, 29 Oct 2007, Martin Pala wrote:
> > >>> Can you run monit in verbose mode (-v option) and send the log? You'll
> > >>> see in it what happened in more detail.
> > >>  Sure, will do it tomorrow early in the morning :)
> > >>
> > >>>>        If I just put "if failed icmp then alert" monit complains about
> > >>>> configuration (I'm using monit-4.9-1), so either I'm doing something
> > >>>> wrong or it's a problem with this verson.
> > >>> I'm sorry - this was typo (i wrote the example just from memory so, the
> > >>> "type echo" was missing).
> > >>  Don't worry, I was just trying just in case I did something wrong
> > >> :)
> > >>
> > >>  Thanks again
> > >>  Pablo
> > >>
> > >>
> > >> --
> > >> To unsubscribe:
> > >> http://lists.nongnu.org/mailman/listinfo/monit-general
> > >>
> > >> ------------------------------------------------------------------------
> > >>
> > >> --
> > >> To unsubscribe:
> > >> http://lists.nongnu.org/mailman/listinfo/monit-general
> >
> >
> > --
> > To unsubscribe:
> > http://lists.nongnu.org/mailman/listinfo/monit-general
> >
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
-- 

Pablo Iranzo Gómez (address@hidden)
RHCE/Global Profesional Services Consultant Spain
Phone: +34 645 01 01 49 (CET/CEST)
GnuPG KeyID: 0xFAD3CF0D

--
Inscrita en el Reg. Mercantil de Madrid – C.I.F. B-82 65 79 41
Directores: Michael Cunningham, Charlie Peters y David Owens
Dirección Registrada: Red Hat S.L., C/ Velazquez 63, Madrid 28001, España
Dirección contacto: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, Planta 3ºD, 28016 
Madrid, Spain

Attachment: signature.asc
Description: Esta parte del mensaje está firmada digitalmente


reply via email to

[Prev in Thread] Current Thread [Next in Thread]