monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monit not restarting a service reliably


From: Jan Rychter
Subject: Re: Monit not restarting a service reliably
Date: Mon, 3 Jun 2019 10:02:11 -0700

Hi,

Thanks for the information, this is exactly what I needed. This setting will 
make monit work for me again :-)

best regards,
--Jan

> On 2019-06-03, at 08:37, address@hidden wrote:
> 
> Hi,
> 
> since monit 5.16.0, the exec action is executed only on a state change. In 
> your case the service didn't transition to the "succeeded" state, so the exec 
> action wasn't repeated.
> 
> If you want to retry the exec action if the service remains in failure state, 
> you can use the "repeat" option.
> 
> Snip from monit 5.16.0 changelog which provides more details:
> 
> --8<--
> New: The exec action is now executed only once, on state change, same way as 
> the alert
> action. The new "repeat" option allows to repeat the exec action after given 
> number of
> cycles if the error persists.  Syntax:
>        if <test> then exec <script> repeat every <x> cycles
> If you want to get the old behaviour, use "repeat every 1 cycle". Example:
>        if failed port 1234 then exec "/usr/bin/myscript.sh" repeat every 5 
> cycles
> --8<--
> 
> Best regards,
> Martin
> 
> 
>> On 31 May 2019, at 19:14, Jan Rychter <address@hidden> wrote:
>> 
>> Hi,
>> 
>> I'm looking for help, because I can't figure out what I'm doing wrong. I 
>> have a simple monit setup, which is supposed to monitor a web server and 
>> restart it if anything seems wrong.
>> 
>> This seems to work but not always. Monit does restart the service, but on 
>> subsequent failures it just notices that the service isn't working and 
>> doesn't act anymore.
>> 
>> Example from the log, where the service was restarted, but went down again, 
>> and monit didn't do anything:
>> 
>> [CEST May 31 06:44:11] info     : 'triac.mysite.com' Monit 5.16 started
>> [CEST May 31 09:36:29] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:37:39] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:37:39] info     : 'mysite.com' exec: /usr/bin/supervisorctl
>> [CEST May 31 09:38:49] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:39:59] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:41:09] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:42:19] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:43:29] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:44:39] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:45:50] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:47:00] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> [CEST May 31 09:48:10] error    : 'mysite.com' failed protocol test [HTTP] 
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource 
>> temporarily unavailable
>> 
>> The net result is that the service doesn't work and monit just sits there, 
>> knowing that the service failed the protocol test, but doing nothing about 
>> it.
>> 
>> I suspect this is because monit does not notice that the service was OK 
>> after restarting for a moment, so it does not notice another transition from 
>> OK to failed.
>> 
>> Here is the relevant part of the configuration (nearly all of it):
>> 
>> set daemon 60
>> check host mysite.com with address mysite.com
>> if failed
>> port 443
>> protocol https
>> with ssl options {verify: enable}
>> for 2 cycles
>> then exec "/usr/bin/supervisorctl restart mysite"
>> if 20 restarts within 60 cycles then unmonitor
>> 
>> Is there a way to achieve unconditional actions? E.g. "even though I haven't 
>> noticed the service to transition from failed to working, restart it anyway 
>> after 60 seconds if it is still in the failed state"
>> 
>> Any help would be much appreciated.
>> 
>> --J.
>> 
>> 
>> -- 
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> -- 
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]