[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Process fails to restart on newer versions of monit
From: |
Martin Pala |
Subject: |
Re: Process fails to restart on newer versions of monit |
Date: |
Wed, 13 May 2015 08:27:07 +0200 |
Thanks for data.
It seems that there is different start command then in the original
configuration snip:
—8<—
Process Name = opsworks-agent
…
Start program = '/usr/bin/env service opsworks-agent start' timeout 30
second(s)
—8<—
vs.
—8<—
check process opsworks-agent with pidfile
“/var/lib/aws/opsworks/pid/opsworks-agent.pid"
start program = "/etc/init.d/opsworks-agent start"
—8<—
Since monit 5.8 the environment variables are no longer purged => the wrapping
with “/usr/bin/env” is not necessary (but should still work).
Please try to change the configuration like this:
check process opsworks-agent with pidfile
"/var/lib/aws/opsworks/pid/opsworks-agent.pid"
start program = "/usr/sbin/service opsworks-agent start"
stop program = "/usr/sbin/service opsworks-agent stop"
depends on opsworks-agent-master-running
depends on opsworks-agent-statistic-daemons-log
depends on opsworks-agent-process-command-daemons-log
depends on opsworks-agent-keep-alive-daemons-log
group opsworks
Regards,
Martin
> On 13 May 2015, at 08:14, Shrinath M <address@hidden> wrote:
>
> OK, done -
>
> Not sure if attaching files is allowed; and not much to show here either - so
> here goes -
>
> Last few lines of log once I restarted in debug mode -
>
> [UTC May 13 06:09:05] info : Starting Monit 5.13 daemon with http
> interface at [*]:2812
> [UTC May 13 06:09:05] info : Monit start delay set -- pause for 5s
> [UTC May 13 06:09:10] info : Starting Monit HTTP server at [*]:2812
> [UTC May 13 06:09:10] info : Monit HTTP server started
> [UTC May 13 06:09:10] info : 'crumble.localdomain' Monit started
> [UTC May 13 06:09:10] info : M/Monit heartbeat started
> [UTC May 13 06:09:10] error : 'opsworks-agent-master-running' process is
> not running
> [UTC May 13 06:09:10] error : 'opsworks-agent' process is not running
> [UTC May 13 06:09:10] info : 'opsworks-agent' trying to restart
> [UTC May 13 06:09:10] info : 'opsworks-agent' start: /usr/bin/env
> [UTC May 13 06:09:42] error : 'opsworks-agent-master-running' process is
> not running
> [UTC May 13 06:09:42] info : 'opsworks-agent-master-running' trying to
> restart
> [UTC May 13 06:09:42] info : 'opsworks-agent' start: /usr/bin/env
> [UTC May 13 06:10:14] error : 'opsworks-agent-master-running' process is
> not running
> [UTC May 13 06:10:14] info : 'opsworks-agent-master-running' trying to
> restart
> [UTC May 13 06:10:14] info : 'opsworks-agent' start: /usr/bin/env
>
>
> The debug produced this -
>
> Starting monit: Adding credentials for user 'admin'
> Runtime constants:
> Control file = /etc/monit/monitrc
> Log file = /var/log/monit.log
> Pid file = /var/run/monit.pid
> Id file = /var/lib/monit.id
> State file = /var/run/monit.state
> Debug = True
> Log = True
> Use syslog = False
> Is Daemon = True
> Use process engine = True
> Poll time = 30 seconds with start delay 5 seconds
> Expect buffer = 256 bytes
> Event queue = base directory /var/monit with 100 slots
> M/Monit(s) = http://[FILTERED_IP]:80/collector with timeout 5
> seconds using credentials
> Mail from = address@hidden
> Mail subject = $SERVICE $EVENT at $DATE
> Mail message = Monit $ACTION $SERVI..(truncated)
> Start monit httpd = True
> httpd bind address = Any/All
> httpd portnumber = 2812
> httpd ssl = Disabled
> httpd signature = Enabled
> httpd auth. style = Basic Authentication
>
> The service list contains the following entries:
>
> Process Name = opsworks-agent-master-running
> Group = opsworks
> Match = opsworks-agent: master
> Monitoring mode = active
> Existence = if does not exist for 2 cycles then restart
>
> Process Name = opsworks-agent
> Group = opsworks
> Pid file = /var/lib/aws/opsworks/pid/opsworks-agent.pid
> Monitoring mode = active
> Start program = '/usr/bin/env service opsworks-agent start' timeout
> 30 second(s)
> Stop program = '/usr/bin/env service opsworks-agent stop' timeout 30
> second(s)
> Existence = if does not exist then restart
> Depends on Service = opsworks-agent-keep-alive-daemons-log
> Depends on Service = opsworks-agent-process-command-daemons-log
> Depends on Service = opsworks-agent-statistic-daemons-log
> Depends on Service = opsworks-agent-master-running
>
> File Name = opsworks-agent-statistic-daemons-log
> Group = opsworks
> Path = /var/log/aws/opsworks/opsworks-agent.statistics.log
> Monitoring mode = active
> Existence = if does not exist for 3 cycles then restart
> Timestamp = if greater than 120 second(s) for 3 cycles then
> restart
>
> File Name = opsworks-agent-process-command-daemons-log
> Group = opsworks
> Path =
> /var/log/aws/opsworks/opsworks-agent.process_command.log
> Monitoring mode = active
> Existence = if does not exist for 3 cycles then restart
> Timestamp = if greater than 120 second(s) for 3 cycles then
> restart
>
> File Name = opsworks-agent-keep-alive-daemons-log
> Group = opsworks
> Path = /var/log/aws/opsworks/opsworks-agent.keep_alive.log
> Monitoring mode = active
> Existence = if does not exist for 3 cycles then restart
> Timestamp = if greater than 120 second(s) for 3 cycles then
> restart
>
> System Name = crumble.localdomain
> Monitoring mode = active
>
> -------------------------------------------------------------------------------
> Monit daemon with PID 26769 awakened
>
>
> On Wed, May 13, 2015 at 11:37 AM Martin Pala <address@hidden> wrote:
> Please make sure monit logging is enabled (the “set logfile” statement) + run
> Monit in debug mode (-v option), try to reproduce the problem and send logs.
>
> Regards,
> Martin
>
>
> > On 13 May 2015, at 07:15, Shrinath M <address@hidden> wrote:
> >
> > I am using AWS Opsworks and AWS uses an old version of monit (5.3.2) to
> > monitor their agent. Obviously, when their opsworks-agent dies, monit
> > restarts it.
> > Recently, I wanted to monitor few processes of my own and required newer
> > versions of monit to use the explicit "restart" command support. I upgraded
> > monit to 5.13.
> > Now, monit does not restart opsworks agent if it dies!
> >
> > I tried looking for changelog of monit to see if something was changed
> > between versions, but could not find them for all versions beyond 5.7.
> > Can someone please take a look at opsworks config below and see what might
> > be breaking?
> >
> > opsworks-config follows -
> > check process opsworks-agent with pidfile
> > "/var/lib/aws/opsworks/pid/opsworks-agent.pid"
> > start program = "/etc/init.d/opsworks-agent start"
> > stop program = "/etc/init.d/opsworks-agent stop"
> > depends on opsworks-agent-master-running
> > depends on opsworks-agent-statistic-daemons-log
> > depends on opsworks-agent-process-command-daemons-log
> > depends on opsworks-agent-keep-alive-daemons-log
> > group opsworks
> >
> > check process opsworks-agent-master-running matching
> > "opsworks-agent:\smaster"
> > if not exist for 2 cycles then restart
> > group opsworks
> >
> > # check run of statistic daemon
> > check file opsworks-agent-statistic-daemons-log with path
> > "/var/log/aws/opsworks/opsworks-agent.statistics.log"
> > if timestamp > 2 minutes for 3 cycles then restart
> > if does not exist for 3 cycles then restart
> > group opsworks
> >
> > # check run of process command daemon
> > check file opsworks-agent-process-command-daemons-log with path
> > "/var/log/aws/opsworks/opsworks-agent.process_command.log"
> > if timestamp > 2 minutes for 3 cycles then restart
> > if does not exist for 3 cycles then restart
> > group opsworks
> >
> > # check run of keep alive deamon
> > check file opsworks-agent-keep-alive-daemons-log with path
> > "/var/log/aws/opsworks/opsworks-agent.keep_alive.log"
> > if timestamp > 2 minutes for 3 cycles then restart
> > if does not exist for 3 cycles then restart
> > group opsworks
> >
> > - end of file
> >
> > Monit logs say restart done, but opsworks doesn't run. If I downgrade to
> > 5.3.2, it does magically run!
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general