monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Monit triggering restart storm


From: Guillaume François
Subject: Monit triggering restart storm
Date: Thu, 9 Nov 2017 12:07:35 +0100

Hi,

I have a bunch of Monit rules to perform check on a service
  1. One check process rule (existence and port checks)
    1. does not exist for 5 cycles then start 
    2.  failed port XXXX for 6 times within 8 cycles then restart
    3.  failed port YYYY for 6 times within 8 cycles then restart
    4.  failed port ZZZZ for 6 times within 8 cycles then restart
  2. Three check program rules with custom checks
    1. if status != 0 for 5 times within 10 cycles then restart
    2. if status != 0 for 5 times within 10 cycles then restart
    3. if status != 0 for 5 times within 10 cycles then restart
  3. One to check log content
    1. check file  + if content = "BIG ERROR" then restart
start/stop rules are 

start program = "/bin/systemctl start myservice"
stop program = "/bin/systemctl stop myservice"

There are no dependency at Monit level but checks are part of the same bunch of groups.

Problem, is that due to multiple issues, I got a "restart" storm as
  1. some  port check failed -> restart issued
  2. lead to error at custom script -> restart issued
  3. content log reading has some lags -> restart issued
Myservice or system.d configuration/feature are not well designed so got "already bind exception" as system.d tried to start several instance at the same time🤔 

So port check failed again, system.d killed the wrong one, MyService was blocked, restart again. etc.....

I had to shutdown Monit to prevent further action (I could have monit -g group unmonitor also), kill every instance of my service, start it correctly, then reactivate Monit


Question: 
Remark: maybe exploring system.D features StartLimitIntervalSe & StartLimitBurst could help.


Best Regards.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]