monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Alerts not being triggered


From: Bruce Reed
Subject: Re: [monit] Alerts not being triggered
Date: Tue, 20 Jan 2009 13:16:31 -0800
User-agent: Microsoft-Entourage/12.11.0.080522

Martin,

You have it right. It was an assumption on my part about an email address
existing in our downstream server when in fact it was just an alias in our
data center relay. So all I had to do was adjust the alert address
(address@hidden) to get my relay to forward appropriately. Purely my mistake,
however, it would have been nice if there were a more verbose (debug like)
logging mechanism in Monit such that I could have seen the mail events and
that would have informed me that Monit was doing its job and at that point I
would have checked my mail relay. The lack of log information and the
non-delivery of the messages caused me to erroneously assume Monit simply
wasn't sending the alert emails.

All is good now though and I can see already Monit will be an indispensable
tool for service alert and restart complementing my Nagios environment well!

Thanks again,
Bruce


On 1/20/09 12:34 PM, "Martin Pala" <address@hidden> wrote:

> Thanks for info.
> 
> Monit logs error when the mailserver fails or returns error class >=
> 400 ... what exactly was the problem in your case? (we can improve the
> error reporting) Since no error was logged by monit it seems that the
> message was accepted by mailserver and the MTA dropped the massage
> later?
> 
> Thanks,
> Martin
> 
> 
> On Jan 20, 2009, at 7:30 PM, Bruce Reed wrote:
> 
>> The strace uncovered my problem and it was with a mail alias, so
>> thanks for
>> the tip!
>> 
>> It would be nice to have more verbose logging by monit to log email
>> events.
>> Had I seen those in the log I would have known it at least sent the
>> message
>> and the problem was with the address I had used. On the other hand,
>> I should
>> have combed my mailserver logs to see if a message had been received
>> for the
>> address I specified.
>> 
>> Bruce
>> 
>> 
>> On 1/16/09 12:14 PM, "Martin Pala" <address@hidden> wrote:
>> 
>>> Looks strange - i don't remember problem like this and even changelog
>>> doesn't mention such issue.
>>> 
>>> It could be good to trace monit to see what happened:
>>> 
>>> strace -f -o monit.trace monit -vI
>>> 
>>> 
>>> The monit.trace file will contain system call traces so we can see
>>> whether it tried to connect to SMTP server and what happened.
>>> 
>>> 
>>> 
>>> 
>>> On Jan 16, 2009, at 9:05 PM, Bruce Reed wrote:
>>> 
>>>> 4.9 rpm from rpmforge
>>>> 
>>>> 
>>>> On 1/16/09 11:55 AM, "Martin Pala" <address@hidden> wrote:
>>>> 
>>>>> The configuration looks OK.
>>>>> 
>>>>> What monit version it is?
>>>>> 
>>>>> Thanks,
>>>>> Martin
>>>>> 
>>>>> 
>>>>> On Jan 16, 2009, at 8:22 PM, Bruce Reed wrote:
>>>>> 
>>>>>> Here is the verbose output. Looks like verbose output begins and
>>>>>> ends at
>>>>>> process start up (host/domain names changed):
>>>>>> 
>>>>>> Starting Process Monitor (monit): monit: Debug: Adding host allow
>>>>>> 'localhost'
>>>>>> monit: Debug: Skipping redundant host 'localhost'
>>>>>> monit: Debug: Skipping redundant host 'localhost'
>>>>>> monit: Debug: Adding credentials for user 'admin'.
>>>>>> Runtime constants:
>>>>>> Control file       = /etc/monit.conf
>>>>>> Log file           = syslog
>>>>>> Pid file           = /var/run/monit.pid
>>>>>> Debug              = True
>>>>>> Log                = True
>>>>>> Use syslog         = True
>>>>>> Is Daemon          = True
>>>>>> Use process engine = True
>>>>>> Poll time          = 60 seconds
>>>>>> Mail server(s)     = prodsmtp.mydomain.net
>>>>>> Mail from          = address@hidden
>>>>>> Mail subject       = monit alert --  $EVENT $SERVICE
>>>>>> Mail message       = $EVENT Service $SERV..(truncated)
>>>>>> Start monit httpd  = True
>>>>>> httpd bind address = localhost
>>>>>> httpd portnumber   = 2812
>>>>>> httpd signature    = True
>>>>>> Use ssl encryption = False
>>>>>> httpd auth. style  = Basic Authentication and Host/Net allow list
>>>>>> Alert mail to      = address@hidden
>>>>>> Alert on         = All events
>>>>>> 
>>>>>> The service list contains the following entries:
>>>>>> 
>>>>>> Process Name          = ntpd
>>>>>> Pid file             = /var/run/ntpd.pid
>>>>>> Monitoring mode      = active
>>>>>> Start program        = '/etc/init.d/ntpd start' timeout 1 cycle(s)
>>>>>> Stop program         = '/etc/init.d/ntpd stop' timeout 1 cycle(s)
>>>>>> Pid                  = if changed 1 times within 1 cycle(s) then
>>>>>> alert
>>>>>> Ppid                 = if changed 1 times within 1 cycle(s) then
>>>>>> alert
>>>>>> Timeout              = If 3 restart within 3 cycles then unmonitor
>>>>>> else if
>>>>>> passed then alert
>>>>>> 
>>>>>> System Name           = test-prod.mydomain.net
>>>>>> Monitoring mode      = active
>>>>>> 
>>>>>> 
>> --------------------------------------------------------------------------->
>>>>> 
>> -
>>>>>> ---
>>>>>> monit: pidfile '/var/run/monit.pid' does not exist
>>>>>> Starting monit daemon with http interface at [localhost:2812]
>>>>>> 
>>>>>> 
>>>>>> Then when ntp is killed I see the following in /var/log/messages:
>>>>>> 
>>>>>> Jan 16 19:10:50 test-prod ntpd[13505]: ntpd exiting on signal 15
>>>>>> Jan 16 19:11:32 test-prod monit[2398]: 'ntpd' process is not
>>>>>> running
>>>>>> Jan 16 19:11:32 test-prod monit[2398]: 'ntpd' trying to restart
>>>>>> Jan 16 19:11:32 test-prod monit[2398]: 'ntpd' start: /etc/init.d/
>>>>>> ntpd
>>>>>> Jan 16 19:11:32 test-prod ntpd[2541]: ntpd address@hidden Tue
>>>>>> Jun 10
>>>>>> 00:07:18 UTC 2008 (1)
>>>>>> Jan 16 19:11:32 test-prod ntpd[2542]: precision = 2.000 usec
>>>>>> .
>>>>>> .
>>>>>> 
>>>>>> There is no additional output from monit and no attempt to send
>>>>>> mail
>>>>>> according to maillog.
>>>>>> 
>>>>>> On 1/16/09 3:52 AM, "Jan-Henrik Haukeland" <address@hidden>
>>>>>> wrote:
>>>>>> 
>>>>>>> Have you tried to specify which mail server Monit should use for
>>>>>>> alerts?
>>>>>>> 
>>>>>>> See
>>>>>>> http://mmonit.com/monit/documentation/monit.html#setting_a_mail_server_f
>>>>>>> or
>>>>>>> _a
>>>>>>> le
>>>>>>> rt_messages
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 16. jan.. 2009, at 08.00, Bruce Reed wrote:
>>>>>>> 
>>>>>>>> I¹ve just begun using monit and I am having difficulties getting
>>>>>>>> monit to send mail. I¹m testing using ntpd and it is restarting
>>>>>>>> the
>>>>>>>> process, but not sending mail on service restart events or
>>>>>>>> timeout.
>>>>>>>> In monit.conf I have:
>>>>>>>> 
>>>>>>>> set alert address@hidden
>>>>>>>> 
>>>>>>>> I then had a check statement like this:
>>>>>>>> 
>>>>>>>> check process ntpd with pidfile /var/run/ntpd.pid
>>>>>>>>  start program = "/etc/init.d/ntpd start"
>>>>>>>>  stop program  = "/etc/init.d/ntpd stop"
>>>>>>>>  if 3 restarts within 3 cycles then timeout
>>>>>>>>  alert address@hidden only on { timeout }
>>>>>>>> 
>>>>>>>> After 3 successive kills of ntpd and restarts by monit, a
>>>>>>>> timeout
>>>>>>>> message was logged, but no mail was sent. I tried removing the
>>>>>>>> alert
>>>>>>>> statement to see if mail would be sent on any event, but I only
>>>>>>>> see
>>>>>>>> information iogged and no mail is sent. Nothing in /var/log/
>>>>>>>> maillog
>>>>>>>> either.
>>>>>>>> 
>>>>>>>> Funny thing is, when I first set this up monit attempted to send
>>>>>>>> mail, but an ACL on my postifx server prevented it from getting
>>>>>>>> through. I fixed that and retried my test, but from that point
>>>>>>>> on no
>>>>>>>> mail was sent. Thought perhaps this was a state caching issue,
>>>>>>>> but
>>>>>>>> no change across monit restart and I installed monit on another
>>>>>>>> server using the same conf files and I get the same results
>>>>>>>> there.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Bruce
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> To unsubscribe:
>>>>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> To unsubscribe:
>>>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> To unsubscribe:
>>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe:
>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> 
>> 
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]