monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: daemon poll interval - monit 5.14 on CentOS 6


From: Geoff Goas
Subject: Re: daemon poll interval - monit 5.14 on CentOS 6
Date: Tue, 9 Aug 2016 15:43:35 -0400

Some more information: In monit 4.x, the start/stop program timeout defaulted to 1 cycle, though this was not documented in the man page. Starting in 5.0, it is no longer possible to specify this timeout in cycles, only seconds (per the man page).

From the changelog:

* It is now possible to define execution timeout for start and
  stop commands. That is, how long Monit will wait after
  executing a command before it assume execution failed. If the
  timeout option is omitted, Monit defaults to 30 seconds. You
  can override the timeout for example for services which are
  starting slower.
  Example syntax:
    start program = "/bin/foo start" with timeout 60 seconds

I tried to specify "1 cycle" for the start/stop program entries, and I get a syntax error as expected.

Would the devs entertain the notion of adding the ability to specify cycles, or the ability to set the timeout at the global level?

With the following patch I can make the default timeout to match up with the daemon interval:

--- src/p.y.orig 2016-08-09 14:23:02.000000000 -0400
+++ src/p.y 2016-08-09 14:23:48.000000000 -0400
@@ -1550,7 +1550,7 @@
                 ;
 
 exectimeout     : /* EMPTY */ {
-                   $<number>$ = EXEC_TIMEOUT;
+                   $<number>$ = Run.polltime;
                   }
                 | TIMEOUT NUMBER SECOND {
                    $<number>$ = $2;
--- src/y.tab.c.orig 2016-08-09 14:54:25.000000000 -0400
+++ src/y.tab.c 2016-08-09 14:54:34.000000000 -0400
@@ -4576,7 +4576,7 @@
   case 418:
 #line 1552 "src/p.y" /* yacc.c:1646  */
     {
-                   (yyval.number) = EXEC_TIMEOUT;
+                   (yyval.number) = Run.polltime;
                   }
 #line 4581 "src/y.tab.c" /* yacc.c:1646  */
     break;


On Mon, Aug 8, 2016 at 3:26 PM, Geoff Goas <address@hidden> wrote:
It looks like I can put "timeout X seconds" after the start program / stop program lines to control this interval. Is there a way to set it globally?

On Fri, Aug 5, 2016 at 1:12 PM, Geoff Goas <address@hidden> wrote:
I think I have found the issue, and I think I may have to walk back my statements on this affecting a certain version or distro.

I have 8 monitored services in an "Execution failed" state. None of the services have a timeout defined. 

The timeout apparently defaults to EXEC_TIMEOUT (30 seconds). monit waits the full 30 seconds for the service check to finally fail before checking the next service that is also in an "Execution failed" state.

[EDT Aug  5 13:03:58] error    : 'service_name' process is not running
[EDT Aug  5 13:03:58] info     : '
service_name' trying to restart
[EDT Aug  5 13:03:58] info     : 'service_name' start: /etc/init.d/service_name
[EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
[EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
[EDT Aug  5 13:03:58] info     : Sleeping for 50000 ms (src/control.c:159)
[EDT Aug  5 13:03:58] info     : Sleeping for 100000 ms (src/control.c:159)
[EDT Aug  5 13:03:58] info     : Sleeping for 200000 ms (src/control.c:159)
[EDT Aug  5 13:03:58] info     : Sleeping for 400000 ms (src/control.c:159)
[EDT Aug  5 13:03:59] info     : Sleeping for 800000 ms (src/control.c:159)
[EDT Aug  5 13:04:00] info     : Sleeping for 1600000 ms (src/control.c:159)
[EDT Aug  5 13:04:01] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:02] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:03] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:04] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:05] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:06] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:07] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:08] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:09] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:10] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:11] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:12] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:13] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:14] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:15] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:16] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:17] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:18] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:19] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:20] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:21] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:22] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:23] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:24] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:25] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:26] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:27] info     : Sleeping for 1000000 ms (src/control.c:159)
[EDT Aug  5 13:04:28] error    : '
service_name' failed to start (exit status 1) -- /etc/init.d/service_name: Shutting down service_name: [  OK  ]
Starting service_name: [  OK  ]^M[FAILED]


8 services at 30 seconds each = 240 seconds, this means the sleep(Run.polltime) in monit.c only gets called every 4 minutes. This is with the daemon interval set to 10 seconds. Notice ~240 seconds (4 minutes) between each occurrence:

# grep 'src/monit.c' /var/log/monit
[EDT Aug  5 12:56:33] info     : Sleeping for 10 seconds (src/monit.c:561)
[EDT Aug  5 13:00:46] info     : Sleeping for 10 seconds (src/monit.c:561)
[EDT Aug  5 13:05:00] info     : Sleeping for 10 seconds (src/monit.c:561)

So how can I control the execTimeout without having monit give up on trying to start that service?

Thanks,

On Fri, Aug 5, 2016 at 11:43 AM, Geoff Goas <address@hidden> wrote:
Hello,

Thanks for the suggestions. In RHEL/CentOS 5 and 6, the default config
is /etc/monit.conf. User configs are ~/.monit.conf. This is the only
change to the source that is being applied by the package maintainer.

Package listing:

# rpm -ql monit
/etc/logrotate.d/monit
/etc/monit.conf
/etc/monit.d
/etc/monit.d/logging
/etc/rc.d/init.d/monit
/usr/bin/monit
/usr/share/doc/monit-5.14
/usr/share/doc/monit-5.14/COPYING
/usr/share/doc/monit-5.14/README
/usr/share/man/man1/monit.1.gz
/var/log/monit

>From an strace of monit starting up:

getcwd("/etc/monit.d", 4096)            = 13
stat("/root/.monit.conf", 0x7fff87cc7560) = -1 ENOENT (No such file or
directory)
stat("/etc/monit.conf", {st_mode=S_IFREG|0600, st_size=11346, ...}) = 0
open("/etc/monit.conf", O_RDONLY)       = 3
open("/etc/monit.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4

Showing that the set daemon directive is specified only once:

# grep 'set daemon' /etc/monit.conf
#set daemon  30              # check services at 30 seconds intervals

# grep 'set daemon' /etc/monit.d/*
/etc/monit.d/00base.conf:set daemon 50

Here is the monit log showing the 30 second interval even though it is
set to 50:

# grep 'Aborting event' /var/log/monit  | tail -n20
[EDT Aug  4 17:05:06] error    : Aborting event
[EDT Aug  4 17:05:36] error    : Aborting event
[EDT Aug  4 17:06:06] error    : Aborting event
[EDT Aug  4 17:06:37] error    : Aborting event
[EDT Aug  4 17:06:37] error    : Aborting event
[EDT Aug  4 17:07:07] error    : Aborting event
[EDT Aug  4 17:07:07] error    : Aborting event
[EDT Aug  4 22:18:13] error    : Aborting event
[EDT Aug  4 22:18:43] error    : Aborting event
[EDT Aug  4 22:19:13] error    : Aborting event
[EDT Aug  4 22:19:44] error    : Aborting event
[EDT Aug  4 22:20:14] error    : Aborting event
[EDT Aug  4 22:20:44] error    : Aborting event
[EDT Aug  4 22:21:15] error    : Aborting event
[EDT Aug  4 22:21:15] error    : Aborting event
[EDT Aug  4 22:21:45] error    : Aborting event
[EDT Aug  4 22:21:45] error    : Aborting event
[EDT Aug  5 11:19:40] error    : Aborting event
[EDT Aug  5 11:20:10] error    : Aborting event
[EDT Aug  5 11:23:53] error    : Aborting event

This behavior is occurring across multiple CentOS 6 hosts. All of the
CentOS 5 hosts running 4.11 and 5.2 with nearly identical
configurations ("alert...on restart" changed to "alert...on nonexist"
on the monit 5.x instances) do not have this issue.

I'm open to more suggestions but I feel as though I will end up having
to get some more debug out of monit.

Thanks,

On Aug 5, 2016 9:16 AM, "Martin Pala" <address@hidden> wrote:
>
> Monit's default configuration file is /etc/monitrc ... the /etc/monit.conf is not used, unless it was added to the search path by 3rd party (for example package maintainer).
>
> There could be also ".monitrc" file in your home directory ... the default search sequence for monit configuration file:
>
>         ~/.monitrc
>         /etc/monitrc
>        address@hidden/monitrc
>         /usr/local/etc/monitrc
>         ./monitrc
>
>
>
>
> > On 05 Aug 2016, at 15:07, Geoff Goas <address@hidden> wrote:
> >
> > I am setting it only in /etc/monit.conf. It is not being set in any other configuration within /etc/monit.d.
> >
> > On Aug 5, 2016 9:03 AM, "Martin Pala" <address@hidden> wrote:
> > Hello,
> >
> > you have most probably two configuration files - the one which you changed is different from the file used by monit.
> >
> > Best regards,
> > Martin
> >
> >
> >> On 05 Aug 2016, at 04:42, Geoff Goas <address@hidden> wrote:
> >>
> >> Hello,
> >>
> >> I'm having an issue with the CentOS 6 release of monit 5.14. I have set the daemon interval to 5, 10, and 50 seconds - monit was fully restarted for each adjustment of the interval - yet it still polls every 30 seconds as if the configured value is being ignored. I also attempting passing the interval using the -d switch to no avail.
> >>
> >> My testing consisted of having monit attempt to start a service that could never possibly start, and without any timeout set. The log shows a 30 second interval between service checks, and so does an strace of the monit process.
> >>
> >> I have monit 5.2 running on CentOS 5.2 with a nearly identical configuration. On that host, I have the daemon interval set to 10 seconds, and it is polling at that interval just fine.
> >>
> >> Do you have any recommendations on what to check next?
> >>
> >> Thanks,
> >>
> >> --
> >> Geoff Goas
> >> Systems Engineer
> >> --
> >> To unsubscribe:
> >> https://lists.nongnu.org/mailman/listinfo/monit-general
> >
> >
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general



--
Geoff Goas
Systems Engineer




--
Geoff Goas
Systems Engineer




--
Geoff Goas
Systems Engineer


reply via email to

[Prev in Thread] Current Thread [Next in Thread]