monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Process fails to restart on newer versions of monit


From: Martin Pala
Subject: Re: Process fails to restart on newer versions of monit
Date: Wed, 13 May 2015 08:27:07 +0200

Thanks for data.

It seems that there is different start command then in the original 
configuration snip:

—8<—
Process Name          = opsworks-agent
…
 Start program        = '/usr/bin/env service opsworks-agent start' timeout 30 
second(s)
—8<—

vs.

—8<—
 check process opsworks-agent with pidfile 
“/var/lib/aws/opsworks/pid/opsworks-agent.pid"
   start program = "/etc/init.d/opsworks-agent start"
—8<—


Since monit 5.8 the environment variables are no longer purged => the wrapping 
with “/usr/bin/env” is not necessary (but should still work).

Please try to change the configuration like this:

check process opsworks-agent with pidfile 
"/var/lib/aws/opsworks/pid/opsworks-agent.pid"
   start program = "/usr/sbin/service opsworks-agent start"
   stop program = "/usr/sbin/service opsworks-agent stop"
   depends on opsworks-agent-master-running
   depends on opsworks-agent-statistic-daemons-log
   depends on opsworks-agent-process-command-daemons-log
   depends on opsworks-agent-keep-alive-daemons-log
   group opsworks


Regards,
Martin



> On 13 May 2015, at 08:14, Shrinath M <address@hidden> wrote:
> 
> OK, done - 
> 
> Not sure if attaching files is allowed; and not much to show here either - so 
> here goes - 
> 
> Last few lines of log once I restarted in debug mode - 
> 
> [UTC May 13 06:09:05] info     : Starting Monit 5.13 daemon with http 
> interface at [*]:2812
> [UTC May 13 06:09:05] info     : Monit start delay set -- pause for 5s
> [UTC May 13 06:09:10] info     : Starting Monit HTTP server at [*]:2812
> [UTC May 13 06:09:10] info     : Monit HTTP server started
> [UTC May 13 06:09:10] info     : 'crumble.localdomain' Monit started
> [UTC May 13 06:09:10] info     : M/Monit heartbeat started
> [UTC May 13 06:09:10] error    : 'opsworks-agent-master-running' process is 
> not running
> [UTC May 13 06:09:10] error    : 'opsworks-agent' process is not running
> [UTC May 13 06:09:10] info     : 'opsworks-agent' trying to restart
> [UTC May 13 06:09:10] info     : 'opsworks-agent' start: /usr/bin/env
> [UTC May 13 06:09:42] error    : 'opsworks-agent-master-running' process is 
> not running
> [UTC May 13 06:09:42] info     : 'opsworks-agent-master-running' trying to 
> restart
> [UTC May 13 06:09:42] info     : 'opsworks-agent' start: /usr/bin/env
> [UTC May 13 06:10:14] error    : 'opsworks-agent-master-running' process is 
> not running
> [UTC May 13 06:10:14] info     : 'opsworks-agent-master-running' trying to 
> restart
> [UTC May 13 06:10:14] info     : 'opsworks-agent' start: /usr/bin/env
> 
> 
> The debug produced this - 
> 
> Starting monit: Adding credentials for user 'admin'
> Runtime constants:
>  Control file       = /etc/monit/monitrc
>  Log file           = /var/log/monit.log
>  Pid file           = /var/run/monit.pid
>  Id file            = /var/lib/monit.id
>  State file         = /var/run/monit.state
>  Debug              = True
>  Log                = True
>  Use syslog         = False
>  Is Daemon          = True
>  Use process engine = True
>  Poll time          = 30 seconds with start delay 5 seconds
>  Expect buffer      = 256 bytes
>  Event queue        = base directory /var/monit with 100 slots
>  M/Monit(s)         = http://[FILTERED_IP]:80/collector with timeout 5 
> seconds using credentials
>  Mail from          = address@hidden
>  Mail subject       = $SERVICE $EVENT at $DATE
>  Mail message       = Monit $ACTION $SERVI..(truncated)
>  Start monit httpd  = True
>  httpd bind address = Any/All
>  httpd portnumber   = 2812
>  httpd ssl          = Disabled
>  httpd signature    = Enabled
>  httpd auth. style  = Basic Authentication
> 
> The service list contains the following entries:
> 
> Process Name          = opsworks-agent-master-running
>  Group                = opsworks
>  Match                = opsworks-agent: master
>  Monitoring mode      = active
>  Existence            = if does not exist for 2 cycles then restart
> 
> Process Name          = opsworks-agent
>  Group                = opsworks
>  Pid file             = /var/lib/aws/opsworks/pid/opsworks-agent.pid
>  Monitoring mode      = active
>  Start program        = '/usr/bin/env service opsworks-agent start' timeout 
> 30 second(s)
>  Stop program         = '/usr/bin/env service opsworks-agent stop' timeout 30 
> second(s)
>  Existence            = if does not exist then restart
>  Depends on Service   = opsworks-agent-keep-alive-daemons-log
>  Depends on Service   = opsworks-agent-process-command-daemons-log
>  Depends on Service   = opsworks-agent-statistic-daemons-log
>  Depends on Service   = opsworks-agent-master-running
> 
> File Name             = opsworks-agent-statistic-daemons-log
>  Group                = opsworks
>  Path                 = /var/log/aws/opsworks/opsworks-agent.statistics.log
>  Monitoring mode      = active
>  Existence            = if does not exist for 3 cycles then restart
>  Timestamp            = if greater than 120 second(s) for 3 cycles then 
> restart
> 
> File Name             = opsworks-agent-process-command-daemons-log
>  Group                = opsworks
>  Path                 = 
> /var/log/aws/opsworks/opsworks-agent.process_command.log
>  Monitoring mode      = active
>  Existence            = if does not exist for 3 cycles then restart
>  Timestamp            = if greater than 120 second(s) for 3 cycles then 
> restart
> 
> File Name             = opsworks-agent-keep-alive-daemons-log
>  Group                = opsworks
>  Path                 = /var/log/aws/opsworks/opsworks-agent.keep_alive.log
>  Monitoring mode      = active
>  Existence            = if does not exist for 3 cycles then restart
>  Timestamp            = if greater than 120 second(s) for 3 cycles then 
> restart
> 
> System Name           = crumble.localdomain
>  Monitoring mode      = active
> 
> -------------------------------------------------------------------------------
> Monit daemon with PID 26769 awakened
> 
> 
> On Wed, May 13, 2015 at 11:37 AM Martin Pala <address@hidden> wrote:
> Please make sure monit logging is enabled (the “set logfile” statement) + run 
> Monit in debug mode (-v option), try to reproduce the problem and send logs.
> 
> Regards,
> Martin
> 
> 
> > On 13 May 2015, at 07:15, Shrinath M <address@hidden> wrote:
> >
> > I am using AWS Opsworks and AWS uses an old version of monit (5.3.2) to 
> > monitor their agent. Obviously, when their opsworks-agent dies, monit 
> > restarts it.
> > Recently, I wanted to monitor few processes of my own and required newer 
> > versions of monit to use the explicit "restart" command support. I upgraded 
> > monit to 5.13.
> > Now, monit does not restart opsworks agent if it dies!
> >
> > I tried looking for changelog of monit to see if something was changed 
> > between versions, but could not find them for all versions beyond 5.7.
> > Can someone please take a look at opsworks config below and see what might 
> > be breaking?
> >
> > opsworks-config follows -
> > check process opsworks-agent with pidfile 
> > "/var/lib/aws/opsworks/pid/opsworks-agent.pid"
> >   start program = "/etc/init.d/opsworks-agent start"
> >   stop program = "/etc/init.d/opsworks-agent stop"
> >   depends on opsworks-agent-master-running
> >   depends on opsworks-agent-statistic-daemons-log
> >   depends on opsworks-agent-process-command-daemons-log
> >   depends on opsworks-agent-keep-alive-daemons-log
> >   group opsworks
> >
> > check process opsworks-agent-master-running matching 
> > "opsworks-agent:\smaster"
> >   if not exist for 2 cycles then restart
> >   group opsworks
> >
> > # check run of statistic daemon
> > check file opsworks-agent-statistic-daemons-log with path 
> > "/var/log/aws/opsworks/opsworks-agent.statistics.log"
> >   if timestamp > 2 minutes for 3 cycles then restart
> >   if does not exist for 3 cycles then restart
> >   group opsworks
> >
> > # check run of process command daemon
> > check file opsworks-agent-process-command-daemons-log with path 
> > "/var/log/aws/opsworks/opsworks-agent.process_command.log"
> >   if timestamp > 2 minutes for 3 cycles then restart
> >   if does not exist for 3 cycles then restart
> >   group opsworks
> >
> > # check run of keep alive deamon
> > check file opsworks-agent-keep-alive-daemons-log with path 
> > "/var/log/aws/opsworks/opsworks-agent.keep_alive.log"
> >   if timestamp > 2 minutes for 3 cycles then restart
> >   if does not exist for 3 cycles then restart
> >   group opsworks
> >
> > - end of file
> >
> > Monit logs say restart done, but opsworks doesn't run. If I downgrade to 
> > 5.3.2, it does magically run!
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]