monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why does monit stop monitor tasks all the time


From: Martin Pala
Subject: Re: Why does monit stop monitor tasks all the time
Date: Thu, 18 Apr 2013 22:48:07 +0200

Hi,

monit runs the start/stop programs in a sandbox and strips all environment 
variables (for security reasons) - it sets only the spartan 
"PATH=/bin:/usr/bin:/sbin:/usr/sbin" variable. If your script depends on some 
env. variable, it may fail when executed by monit.

You should find more details in the monit log and/or your start log 
(/home/deployer/apps/au/current/log/sidekiq.log).

The reason why monitoring is disabled is most probably the following statement 
(you can verify it in the monit log):
--8<--
if 3 restarts within 18 cycles then timeout
--8<--

=> if the restart failed 3 times in 18 cycles, monit will disable the 
monitoring of this service (timeout)

Regards,
Martin



On Apr 18, 2013, at 2:59 PM, Niels Kristian Schjødt <address@hidden> wrote:

> Hi, I have monit setup for monitoring some background workers in my rails 
> project. Whenever I deploy new code, monit should take care of restarting 
> them after they are shut down. But every time I deploy, monit only starts up 
> half of them. If I ssh into the server though and run "sudo monit validate" 
> then it correctly sees that they are not running, and spins them up. But if I 
> don't run that command manually, then nothing happens. What could be wrong, I 
> have no idea how to debug it further?
> 
> Here is my configs:
> 
> ############################## Monitrc #################################
> set daemon 10
> 
> set logfile /var/log/monit.log
> set idfile /var/lib/monit/id
> set statefile /var/lib/monit/state
> 
> set eventqueue
>  basedir /var/lib/monit/events
>  slots 100
> 
> set eventqueue basedir /var/monit/ slots 1000
> set mmonit http://monit:address@hidden:8080/collector
> set httpd port 2812 and use address 192.168.0.3
>  allow localhost
>  allow 192.168.0.1
>  allow user:password
> 
> check system master-worker-server
>  if loadavg(5min) > 4 for 60 cycles then alert
>  if memory > 75% for 60 cycles then alert
>  if cpu(user) > 75% for 60 cycles then alert
> 
> include /etc/monit/conf.d/*
> 
> ############################# /etc/monit/conf.d/sidekiq.conf 
> ######################################
> 
> # Check for Ruby sidekiq worker process
> 
> check process da_workers-0.pid with pidfile 
> /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid
>  start program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd 
> /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C 
> /home/deployer/apps/au/shared/config/workers/da_workers.yml -i 0 -P 
> /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid >> 
> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer and 
> gid deployer with timeout 250 seconds
>  stop program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ 
> -d /home/deployer/apps/au/current ] && [ -f 
> /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid ] && kill -0 `cat 
> /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid`> /dev/null 2>&1; 
> then cd /home/deployer/apps/au/current && bundle exec sidekiqctl stop 
> /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid 3 ; else echo 
> 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer with timeout 
> 120 seconds
>  if cpu usage > 50% for 18 cycles then restart
>  if mem > 1200.0 MB for 18 cycles then restart
>  if 3 restarts within 18 cycles then timeout
> 
> check process da_data_maintenance_workers-0.pid with pidfile 
> /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid
>  start program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd 
> /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C 
> /home/deployer/apps/au/shared/config/workers/da_data_maintenance_workers.yml 
> -i 0 -P 
> /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid 
> >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer 
> and gid deployer with timeout 250 seconds
>  stop program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ 
> -d /home/deployer/apps/au/current ] && [ -f 
> /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid 
> ] && kill -0 `cat 
> /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid`>
>  /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec 
> sidekiqctl stop 
> /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid 
> 3 ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid 
> deployer with timeout 120 seconds
>  if cpu usage > 50% for 18 cycles then restart
>  if mem > 1200.0 MB for 18 cycles then restart
>  if 3 restarts within 18 cycles then timeout
> 
> check process da_data_collecting_workers-0.pid with pidfile 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid
>  start program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd 
> /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C 
> /home/deployer/apps/au/shared/config/workers/da_data_collecting_workers.yml 
> -i 0 -P 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid 
> >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer 
> and gid deployer with timeout 250 seconds
>  stop program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ 
> -d /home/deployer/apps/au/current ] && [ -f 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid ] 
> && kill -0 `cat 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid`> 
> /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec 
> sidekiqctl stop 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid 3 
> ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer 
> with timeout 120 seconds
>  if cpu usage > 50% for 18 cycles then restart
>  if mem > 1200.0 MB for 18 cycles then restart
>  if 3 restarts within 18 cycles then timeout
> 
> check process da_data_collecting_workers-1.pid with pidfile 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid
>  start program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd 
> /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C 
> /home/deployer/apps/au/shared/config/workers/da_data_collecting_workers.yml 
> -i 1 -P 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid 
> >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer 
> and gid deployer with timeout 250 seconds
>  stop program = "/bin/bash -l -c 'HOME=/home/deployer 
> PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ 
> -d /home/deployer/apps/au/current ] && [ -f 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid ] 
> && kill -0 `cat 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid`> 
> /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec 
> sidekiqctl stop 
> /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid 3 
> ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer 
> with timeout 120 seconds
>  if cpu usage > 50% for 18 cycles then restart
>  if mem > 1200.0 MB for 18 cycles then restart
>  if 3 restarts within 18 cycles then timeout
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]