[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Restart timer for checking services
From: |
Martin Pala |
Subject: |
Re: Restart timer for checking services |
Date: |
Fri, 9 Aug 2013 14:00:12 +0200 |
Hello,
the start timeout waits only for the process itself to start - as soon as the
process shows up in the process table, the start command is finished and the
testing resumed. The restart doesn't reset the errors record - the "5 cycles"
condition will then match immediately, as the cycles before the restart are
counted as well.
We will change modify the restart command to reset the pre-restart error
cycles. Also the timeout should temporarily suppress the errors from the same
service tests till it expires.
Regards,
Martin
On Aug 7, 2013, at 8:04 PM, David Paper <address@hidden> wrote:
> Greetings,
>
> I've dug through the monit docs, examples and changelog from 5.2.3 to 5.5.1,
> and I am unable to find a reference to this problem. Here is what I am
> seeing. Using Monit 5.2.3 on RedHat linux 5.4 86_x64 platform.
>
> I have a process that locks up due to out of memory (java) and monit tries to
> stop/start it. When I manually stop/start the process, monit waits the 180
> seconds before it begins testing, and can test successfully. The job works
> as defined. The process takes more than 2 minutes to come online and start
> listening for TCP requests. What doesn't work is that the monit restart
> functionality appears to immediately test the port 1 second after restart,
> again at 1 minute after restart, then sensing the process isn't working
> correctly, tries to restart it, and the sequence begins all over. If I
> didn't know better, I would say that Monit is ignoring the defined time/cycle
> settings on a restart.
>
> My monit job for this process looks like this:
>
> check process jboss-ssp with pidfile /var/run/jboss/jboss-sspnode.pid
> start program = "/opt/jboss/bin/monit_run.sh -c sspnode -b 10.91.51.32
> -g ssp-io-lp1 -u 239.255.150.1 -Djboss.messaging.ServerPeerID=1"
> as uid 349 and as gid 349 with timeout 180 seconds
> stop program = "/bin/bash -c 'kill -9 `cat
> /var/run/jboss/jboss-sspnode.pid`'"
> as uid 349 and as gid 349
> if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
> alert
> if failed host 10.91.51.141 port 8080 for 5 times within 5 cycles then
> restart
>
> Here is my monitrc:
>
> set daemon 60 # check services at 1-minute intervals
> with start delay 60 # optional: delay the first check by 1-minute
> set logfile syslog facility log_daemon
> set idfile /var/run/monit.id
> set statefile /var/run/monit.state
> set mailserver smartmail.mydomain.com, # primary mailserver
> set eventqueue
> basedir /opt/monit/eventqueue #set the base directory where events will
> be stored
> slots 100 # optionally limit the queue size
> set alert address@hidden # receive all alerts
> set httpd port 2812 and
> use address localhost # only accept connection from localhost
> allow localhost # allow localhost to connect to the server and
> include /opt/monit/monit.d/*
>
> The syslog messages that show monits behavior:
>
> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
> connection to INET[10.91.51.141:8080] via TCP
> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug 7 04:02:26 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug 7 04:02:27 stdeciovag1 monit[4111]: 'jboss-ssp' start:
> /opt/jboss/bin/monit_run.sh
> Aug 7 04:02:27 stdeciovag1 logger: Running /opt/jboss/bin/run.sh
> Aug 7 04:02:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
> connection to INET[10.91.51.141:8080] via TCP
> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
> connection to INET[10.91.51.141:8080] via TCP
> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug 7 04:03:28 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug 7 04:03:29 stdeciovag1 monit[4111]: 'jboss-ssp' start:
> /opt/jboss/bin/monit_run.sh
> Aug 7 04:03:29 stdeciovag1 logger: Running /opt/DECE_jboss/bin/run.sh
> Aug 7 04:03:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
> connection to INET[10.91.51.141:8080] via TCP
> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' failed, cannot open a
> connection to INET[10.91.51.141:8080] via TCP
> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' trying to restart
> Aug 7 04:04:30 stdeciovag1 monit[4111]: 'jboss-ssp' stop: /bin/bash
> Aug 7 04:04:31 stdeciovag1 monit[4111]: 'jboss-ssp' start:
> /opt/jboss/bin/monit_run.sh
> ….
>
> This goes on forever until someone manually intervenes and stops and starts
> the monit job manually.
>
> Any help/guidance would be appreciated.
>
> Thanks,
>
> -dave
>
>
>
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general