[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Monit not restarting a service reliably
From: |
Jan Rychter |
Subject: |
Re: Monit not restarting a service reliably |
Date: |
Mon, 3 Jun 2019 10:02:11 -0700 |
Hi,
Thanks for the information, this is exactly what I needed. This setting will
make monit work for me again :-)
best regards,
--Jan
> On 2019-06-03, at 08:37, address@hidden wrote:
>
> Hi,
>
> since monit 5.16.0, the exec action is executed only on a state change. In
> your case the service didn't transition to the "succeeded" state, so the exec
> action wasn't repeated.
>
> If you want to retry the exec action if the service remains in failure state,
> you can use the "repeat" option.
>
> Snip from monit 5.16.0 changelog which provides more details:
>
> --8<--
> New: The exec action is now executed only once, on state change, same way as
> the alert
> action. The new "repeat" option allows to repeat the exec action after given
> number of
> cycles if the error persists. Syntax:
> if <test> then exec <script> repeat every <x> cycles
> If you want to get the old behaviour, use "repeat every 1 cycle". Example:
> if failed port 1234 then exec "/usr/bin/myscript.sh" repeat every 5
> cycles
> --8<--
>
> Best regards,
> Martin
>
>
>> On 31 May 2019, at 19:14, Jan Rychter <address@hidden> wrote:
>>
>> Hi,
>>
>> I'm looking for help, because I can't figure out what I'm doing wrong. I
>> have a simple monit setup, which is supposed to monitor a web server and
>> restart it if anything seems wrong.
>>
>> This seems to work but not always. Monit does restart the service, but on
>> subsequent failures it just notices that the service isn't working and
>> doesn't act anymore.
>>
>> Example from the log, where the service was restarted, but went down again,
>> and monit didn't do anything:
>>
>> [CEST May 31 06:44:11] info : 'triac.mysite.com' Monit 5.16 started
>> [CEST May 31 09:36:29] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:37:39] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:37:39] info : 'mysite.com' exec: /usr/bin/supervisorctl
>> [CEST May 31 09:38:49] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:39:59] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:41:09] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:42:19] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:43:29] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:44:39] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:45:50] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:47:00] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>> [CEST May 31 09:48:10] error : 'mysite.com' failed protocol test [HTTP]
>> at [mysite.com]:443 [TCP/IP SSL] -- HTTP: Error receiving data -- Resource
>> temporarily unavailable
>>
>> The net result is that the service doesn't work and monit just sits there,
>> knowing that the service failed the protocol test, but doing nothing about
>> it.
>>
>> I suspect this is because monit does not notice that the service was OK
>> after restarting for a moment, so it does not notice another transition from
>> OK to failed.
>>
>> Here is the relevant part of the configuration (nearly all of it):
>>
>> set daemon 60
>> check host mysite.com with address mysite.com
>> if failed
>> port 443
>> protocol https
>> with ssl options {verify: enable}
>> for 2 cycles
>> then exec "/usr/bin/supervisorctl restart mysite"
>> if 20 restarts within 60 cycles then unmonitor
>>
>> Is there a way to achieve unconditional actions? E.g. "even though I haven't
>> noticed the service to transition from failed to working, restart it anyway
>> after 60 seconds if it is still in the failed state"
>>
>> Any help would be much appreciated.
>>
>> --J.
>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general