[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Restart timer for checking services
From: |
David Paper |
Subject: |
Re: Restart timer for checking services |
Date: |
Fri, 9 Aug 2013 08:37:47 -0400 |
Hi Martin,
Thanks for the detailed reply. I was expecting to have something
mis-configured. I'll keep my eyes out for Monit 5.5.2 and the changelog.
-dave
On Aug 9, 2013, at 8:00 AM, Martin Pala <address@hidden> wrote:
> Hello,
>
> the start timeout waits only for the process itself to start - as soon as the
> process shows up in the process table, the start command is finished and the
> testing resumed. The restart doesn't reset the errors record - the "5 cycles"
> condition will then match immediately, as the cycles before the restart are
> counted as well.
>
> We will change modify the restart command to reset the pre-restart error
> cycles. Also the timeout should temporarily suppress the errors from the same
> service tests till it expires.
>
> Regards,
> Martin
>
>
>
>
> On Aug 7, 2013, at 8:04 PM, David Paper <address@hidden> wrote:
>
>> Greetings,
>>
>> I've dug through the monit docs, examples and changelog from 5.2.3 to 5.5.1,
>> and I am unable to find a reference to this problem. Here is what I am
>> seeing. Using Monit 5.2.3 on RedHat linux 5.4 86_x64 platform.
>>
>> I have a process that locks up due to out of memory (java) and monit tries
>> to stop/start it. When I manually stop/start the process, monit waits the
>> 180 seconds before it begins testing, and can test successfully. The job
>> works as defined. The process takes more than 2 minutes to come online and
>> start listening for TCP requests. What doesn't work is that the monit
>> restart functionality appears to immediately test the port 1 second after
>> restart, again at 1 minute after restart, then sensing the process isn't
>> working correctly, tries to restart it, and the sequence begins all over.
>> If I didn't know better, I would say that Monit is ignoring the defined
>> time/cycle settings on a restart.
>>
>> My monit job for this process looks like this:
>>
>> check process jboss-ssp with pidfile /var/run/jboss/jboss-sspnode.pid
>> start program = "/opt/jboss/bin/monit_run.sh -c sspnode -b 10.91.51.32
>> -g ssp-io-lp1 -u 239.255.150.1 -Djboss.messaging.ServerPeerID=1"
>> as uid 349 and as gid 349 with timeout 180 seconds
>> stop program = "/bin/bash -c 'kill -9 `cat
>> /var/run/jboss/jboss-sspnode.pid`'"
>> as uid 349 and as gid 349
>> if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
>> alert
>> if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
>> restart
>>
>> Here is my monitrc:
>>
>> set daemon 60 # check services at 1-minute intervals
>> with start delay 60 # optional: delay the first check by 1-minute
>> set logfile syslog facility log_daemon
>> set idfile /var/run/monit.id
>> set statefile /var/run/monit.state
>> set mailserver smartmail.mydomain.com, # primary mailserver
>> set eventqueue
>> basedir /opt/monit/eventqueue #set the base directory where events will
>> be stored
>> slots 100 # optionally limit the queue size
>> set alert address@hidden # receive all alerts
>> set httpd port 2812 and
>> use address localhost # only accept connection from localhost
>> allow localhost # allow localhost to connect to the server and
>> include /opt/monit/monit.d/*
>>
>> The syslog messages that show monits behavior:
>>
>> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
>> connection to INET[10.91.51.141:8080] via TCP
>> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
>> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
>> Aug 7 04:02:27 stdeciovag1 monit[4111]: 'jboss-ssp' start:
>> /opt/jboss/bin/monit_run.sh
>> Aug 7 04:02:27 stdeciovag1 logger: Running /opt/jboss/bin/run.sh
>> Aug 7 04:02:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
>> connection to INET[10.91.51.141:8080] via TCP
>> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
>> connection to INET[10.91.51.141:8080] via TCP
>> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
>> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
>> Aug 7 04:03:29 stdeciovag1 monit[4111]: 'jboss-ssp' start:
>> /opt/jboss/bin/monit_run.sh
>> Aug 7 04:03:29 stdeciovag1 logger: Running /opt/DECE_jboss/bin/run.sh
>> Aug 7 04:03:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
>> connection to INET[10.91.51.141:8080] via TCP
>> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
>> connection to INET[10.91.51.141:8080] via TCP
>> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
>> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
>> Aug 7 04:04:31 stdeciovag1 monit[4111]: 'jboss-ssp' start:
>> /opt/jboss/bin/monit_run.sh
>> ….
>>
>> This goes on forever until someone manually intervenes and stops and starts
>> the monit job manually.
>>
>> Any help/guidance would be appreciated.
>>
>> Thanks,
>>
>> -dave
>>
>>
>>
>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
--
Dave Paper address@hidden
"The trouble with quotes on the Internet is you never know if they are
genuine.” —Abraham Lincoln