monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit not catching failed ping test


From: address@hidden
Subject: Re: monit not catching failed ping test
Date: Fri, 8 Mar 2019 22:04:51 +0100

The interval between checks is 120 seconds => it can take up to ~2 minutes to detect error with this settings.

You can lower the interval to for example 5 seconds for faster error detection.

Best regards,
Martin


On 8 Mar 2019, at 22:00, Fant, Andrew (NIH/NIDA) [E] <address@hidden> wrote:

In the monitrc file, I have:
 
set daemon   120
 
As for the monit -vi output, it has 22 remote host checks in total.  A shortened, anonymized copy of it is:
 
Adding 'allow localhost' -- host resolved to [::ffff:127.0.0.1]
Adding credentials for user 'admin'
Runtime constants:
 Control file       = /etc/monitrc
 Log file           = syslog
 Pid file           = /etc/monit/monit.pid
 Id file            = /etc/monit/monit.id
 State file         = /etc/monit/monit.state
 Debug              = True
 Log                = True
 Use syslog         = True
 Is Daemon          = True
 Use process engine = True
 Limits             = {
                    =   programOutput:     512 B
                    =   sendExpectBuffer:  256 B
                    =   fileContentBuffer: 512 B
                    =   httpContentBuffer: 1 MB
                    =   networkTimeout:    5 s
                    =   programTimeout:    5 m
                    =   stopTimeout:       30 s
                    =   startTimeout:      30 s
                    =   restartTimeout:    30 s
                    = }
 On reboot          = start
 Poll time          = 120 seconds with start delay 0 seconds
 Event queue        = base directory /var/monitor with 1000 slots
 M/Monit(s)         = http://[host1.local]:8080/collector with timeout 5 s with credentials
 Start monit httpd  = True
 httpd bind address = localhost
 httpd portnumber   = 2812
 httpd signature    = Enabled
 httpd auth. style  = Basic Authentication and Host/Net allow list
 
The service list contains the following entries:
 
System Name           = host1
 Monitoring mode      = active
 On reboot            = start
 
Remote Host Name      = host2_ping
 Address              = 192.168.1.2
 Monitoring mode      = active
 On reboot            = start
 Ping                 = if failed [count 3 size 64 with timeout 5 s] then alert
 
-------------------------------------------------------------------------------
 
Hopefully this will be of some use.
 
 
--                                        
Andrew Fant                      |            Systems Administrator
address@hidden       |      Lei Shi Lab , NIH/NIDA/IRP
(443)740-2849                   |
 
From: "address@hidden" <address@hidden>
Reply-To: This is the general mailing list for monit <address@hidden>
Date: Friday, March 8, 2019 at 3:26 PM
To: This is the general mailing list for monit <address@hidden>
Subject: Re: monit not catching failed ping test
 
Hello, 
 
monit checks the service in intervals given by the "set daemon <x>" settings. If the interval between checks is long or the check is blocked by some service timeout/action, then the interval can be longer.
 
Please can you check the "set daemon" settings and run monit in debug mode?:
 
1.) stop monit
2.) monit -vI
 
Best regards,
Martin
 


On 8 Mar 2019, at 16:49, Fant, Andrew (NIH/NIDA) [E] <address@hidden> wrote:
 
Good morning.
     I have a small monitoring setup with m/monit 3.7.2, using monit 5.25.2 as the agent.   There are a couple of systems that I cannot install monit on that I still need to be aware of any downtime, so I have added them as ping checks in the monitrc on the host where I installed m/monit.  Yesterday, one of those remote systems went down, but monit and m/monit didn’t report an alert for it and still have its status as OK.  Using anonymized information,  the entry in the monitrc on host1 is:
 
CHECK HOST host2_ping with ADDRESS 192.168.1.2
        IF FAILED ping THEN ALERT
 
And from the command line on host1:
 
host1% monit status host2_ping
Monit 5.25.2 uptime: 48d 19h 8m
 
Remote Host 'host2_ping'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  ping response time           -
  data collected               Fri, 08 Mar 2019 10:41:33
 
But:
 
host1% ping host2
PING host2.example.org (192.168.1.2) 56(84) bytes of data.
From host1.example.org (192.168.1.1) icmp_seq=1 Destination Host Unreachable
From host1.example.org (192.168.1.1) icmp_seq=2 Destination Host Unreachable
From host1.example.org (192.168.1.1) icmp_seq=3 Destination Host Unreachable
 
Clearly there is a disconnect between the OS-provided ping utility and what monit is seeing.   I’m sure that it’s probably a simple error in configuration, but I am not seeing what I did wrong.   Can someone please set me on the correct path?
 
Thank you
 
--                                        
Andrew Fant                      |            Systems Administrator
address@hidden       |      Lei Shi Lab , NIH/NIDA/IRP
(443)740-2849                   |
 
-- 
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]