monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Restart timer for checking services


From: Martin Pala
Subject: Re: Restart timer for checking services
Date: Fri, 9 Aug 2013 14:00:12 +0200

Hello,

the start timeout waits only for the process itself to start - as soon as the 
process shows up in the process table, the start command is finished and the 
testing resumed. The restart doesn't reset the errors record - the "5 cycles" 
condition will then match immediately, as the cycles before the restart are 
counted as well.

We will change modify the restart command to reset the pre-restart error 
cycles. Also the timeout should temporarily suppress the errors from the same 
service tests till it expires.

Regards,
Martin




On Aug 7, 2013, at 8:04 PM, David Paper <address@hidden> wrote:

> Greetings,
> 
> I've dug through the monit docs, examples and changelog from 5.2.3 to 5.5.1, 
> and I am unable to find a reference to this problem.  Here is what I am 
> seeing.  Using Monit 5.2.3 on RedHat linux 5.4 86_x64 platform.  
> 
> I have a process that locks up due to out of memory (java) and monit tries to 
> stop/start it. When I manually stop/start the process, monit waits the 180 
> seconds before it begins testing, and can test successfully.  The job works 
> as defined.  The process takes more than 2 minutes to come online and start 
> listening for TCP requests.    What doesn't work is that the monit restart 
> functionality appears to immediately test the port 1 second after restart, 
> again at 1 minute after restart, then sensing the process isn't working 
> correctly, tries to restart it, and the sequence begins all over.   If I 
> didn't know better, I would say that Monit is ignoring the defined time/cycle 
> settings on a restart.
> 
> My monit job for this process looks like this:
> 
> check process jboss-ssp with pidfile /var/run/jboss/jboss-sspnode.pid
>       start program = "/opt/jboss/bin/monit_run.sh -c sspnode -b 10.91.51.32 
> -g ssp-io-lp1 -u 239.255.150.1 -Djboss.messaging.ServerPeerID=1" 
>               as uid 349 and as gid 349 with timeout 180 seconds
>       stop program = "/bin/bash -c 'kill -9 `cat 
> /var/run/jboss/jboss-sspnode.pid`'"
>               as uid 349 and as gid 349 
>       if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then 
> alert
>       if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then 
> restart
> 
> Here is my monitrc:
> 
> set daemon  60            # check services at 1-minute intervals
>     with start delay 60  # optional: delay the first check by 1-minute
> set logfile syslog facility log_daemon                       
> set idfile /var/run/monit.id
> set statefile /var/run/monit.state
> set mailserver smartmail.mydomain.com,               # primary mailserver
> set eventqueue
>     basedir /opt/monit/eventqueue #set the base directory where events will 
> be stored
>     slots 100           # optionally limit the queue size
> set alert address@hidden                # receive all alerts
> set httpd port 2812 and
>    use address localhost  # only accept connection from localhost
>    allow localhost        # allow localhost to connect to the server and
> include /opt/monit/monit.d/*
> 
> The syslog messages that show monits behavior:
> 
> Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
> connection to INET[10.91.51.141:8080] via TCP
> Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug  7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug  7 04:02:27 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
> /opt/jboss/bin/monit_run.sh
> Aug  7 04:02:27 stdeciovag1 logger: Running /opt/jboss/bin/run.sh
> Aug  7 04:02:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
> connection to INET[10.91.51.141:8080] via TCP
> Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
> connection to INET[10.91.51.141:8080] via TCP
> Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug  7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug  7 04:03:29 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
> /opt/jboss/bin/monit_run.sh
> Aug  7 04:03:29 stdeciovag1 logger: Running /opt/DECE_jboss/bin/run.sh
> Aug  7 04:03:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
> connection to INET[10.91.51.141:8080] via TCP
> Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a 
> connection to INET[10.91.51.141:8080] via TCP
> Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug  7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug  7 04:04:31 stdeciovag1 monit[4111]: 'jboss-ssp' start: 
> /opt/jboss/bin/monit_run.sh
> ….
> 
> This goes on forever until someone manually intervenes and stops and starts 
> the monit job manually.
> 
> Any help/guidance would be appreciated.
> 
> Thanks,
> 
> -dave
> 
> 
> 
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]