[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [monit] Monit Span Multiple instances
From: |
Nick Upson |
Subject: |
Re: [monit] Monit Span Multiple instances |
Date: |
Tue, 5 Feb 2008 13:36:59 +0000 |
this looks and sounds like the same problem I'm having. A restart of a
process causes 2 copies to run, when they are both talking to the same
serial port this is not good.
here is my verbose monit (4.10) output
[MST Feb 4 08:27:38] debug : 'bs1' cpu usage check passed [current
cpu usage=0.0%]
[MST Feb 4 08:27:38] debug : 'bs1' total mem amount check passed
[current total mem amount=776kB]
[MST Feb 4 08:28:00] debug : 'bs1' zombie check passed [status_flag=0000]
[MST Feb 4 08:28:00] debug : 'bs1' PID has not changed since last cycle
[MST Feb 4 08:28:00] debug : 'bs1' PPID has not changed since last cycle
[MST Feb 4 08:28:00] debug : 'bs1' cpu usage check passed [current
cpu usage=0.0%]
[MST Feb 4 08:28:00] debug : 'bs1' total mem amount check passed
[current total mem amount=776kB]
Mon Feb 4 08:28:07 MST 2008 restart bs1
[MST Feb 4 08:28:07] info : restart service 'bs1' on user request
[MST Feb 4 08:28:07] info : 'bs1' trying to restart
[MST Feb 4 08:28:07] debug : Monitoring disabled -- service bs1
[MST Feb 4 08:28:07] info : 'bs1' stop: /opt/unb/bin/bs.sh
[MST Feb 4 08:28:08] debug : 'bs1' Error testing process id
[10793] -- No such process
[MST Feb 4 08:28:08] debug : 'bs1' Error testing process id
[10793] -- No such process
[MST Feb 4 08:28:08] debug : 'bs1' Error testing process id
[10793] -- No such process
[MST Feb 4 08:28:08] info : 'bs1' start: /opt/unb/bin/bs.sh
[MST Feb 4 08:28:08] debug : 'bs1' Error testing process id
[10793] -- No such process
[MST Feb 4 08:28:08] debug : Monitoring enabled -- service bs1
[MST Feb 4 08:28:08] debug : 'bs1' check skipped -- service
already handled in a dependency chain
[MST Feb 4 08:28:08] debug : 'bs1' Error testing process id
[10793] -- No such process
[MST Feb 4 08:28:09] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:28:10] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
which continues until
[MST Feb 4 08:30:06] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:07] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] error : 'bs1' process is not running
[MST Feb 4 08:30:08] info : 'bs1' trying to restart
[MST Feb 4 08:30:08] debug : Monitoring disabled -- service bs1
[MST Feb 4 08:30:08] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] info : 'bs1' start: /opt/unb/bin/bs.sh
[MST Feb 4 08:30:08] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] debug : Monitoring enabled -- service bs1
[MST Feb 4 08:30:08] debug : monit: pidfile '/var/run/bs1.pid'
does not exist
[MST Feb 4 08:30:08] error : 'bs1' failed to start
[MST Feb 4 08:30:09] info : 'bs1' started
[MST Feb 4 08:32:08] info : 'bs1' process is running with pid 1370
[MST Feb 4 08:32:08] debug : 'bs1' zombie check passed [status_flag=0000]
[MST Feb 4 08:32:08] debug : 'bs1' cpu usage check passed [current
cpu usage=0.0%]
[MST Feb 4 08:32:08] debug : 'bs1' total mem amount check passed
[current total mem amount=556kB]
[MST Feb 4 08:34:08] debug : 'bs1' zombie check passed [status_flag=0000]
[MST Feb 4 08:34:08] debug : 'bs1' PID has not changed since last cycle
[MST Feb 4 08:34:08] debug : 'bs1' PPID has not changed since last cycle
[MST Feb 4 08:34:08] debug : 'bs1' cpu usage check passed [current
cpu usage=0.0%]
[MST Feb 4 08:34:08] debug : 'bs1' total mem amount check passed
[current total mem amount=556kB]
but there are now 2 copies of the bs1 process running
This is on fc5. The start/stop script does use the standard start/stop
routines for fc5 which include a remove of the pid file
On 04/02/2008, Martin Pala <address@hidden> wrote:
> Hi,
>
> there was similar problem which was fixed in monit 4.9:
>
> --8<--
> * Fix the extra restart action which was called by monit
> in addition to user requested start action of stopped
> process. This didn't occured in the case that the 'every'
> statement was used on the service definition as well. Thanks
> to Aaron Scamehorn for help.
> --8<--
>
> It seems however that some applications still has this or similar
> problem (reported by several users).
>
> I'll look on it ...
>
>
> Martin
>
>
>
> Navaneethakrishnan Goapl wrote:
> >
> > Hi,
> >
> > Monit Version : 4.9
> > OS Version : CentOS release 4.4
> >
> > I am facing the following issue more often. Monit is working fine for
> > some time. But at some point of time, if I restart the process, Monit
> > span multiple instance of that process. I see that this is the problem
> > with earlier releases of MONIT. Is this issue still persist in the
> > latest version? Could some one reply to this?
> >
> > root 8580 1 0 05:31 ? 00:00:00 /bin/sh
> > /opt/CSCOacsvw/resources/monit/monit_script.sh jobmanager start
> > root 8591 1 0 05:31 ? 00:00:00 /bin/sh
> > /opt/CSCOacsvw/resources/monit/monit_script.sh jobmanager start
> >
> >
> > Monitrc
> > -------
> >
> > check process Jobmanager with pidfile
> > "/opt/CSCOacsvw/resources/monit/jobmanager.pid"
> > start program = "/opt/CSCOacsvw/resources/monit/monit_script.sh
> > jobmanager start"
> > stop program = "/opt/CSCOacsvw/resources/monit/monit_script.sh
> > jobmanager stop"
> >
> > monit -vc ./monitrc start all
> >
> > Runtime constants:
> > Control file = ./monitrc
> > Log file = /opt/CSCOacsvw/log/monit_errors.log
> > Pid file = /var/run/monit.pid
> > Debug = True
> > Log = True
> > Use syslog = False
> > Is Daemon = True
> > Use process engine = True
> > Poll time = 60 seconds
> > Mail server(s) = localhost
> > Mail from = (not defined)
> > Mail subject = (not defined)
> > Mail message = (not defined)
> > Start monit httpd = True
> > httpd bind address = Any/All
> > httpd portnumber = 2812
> > httpd signature = True
> > Use ssl encryption = False
> > httpd auth. style = Host/Net allow list
> >
> >
> > Process Name = Jobmanager
> > Pid file = /opt/CSCOacsvw/resources/monit/jobmanager.pid
> > Monitoring mode = active
> > Start program = '/opt/CSCOacsvw/resources/monit/monit_script.sh
> > jobmanager start' timeout 1 cycle(s)
> > Stop program = '/opt/CSCOacsvw/resources/monit/monit_script.sh
> > jobmanager stop' timeout 1 cycle(s)
> > Pid = if changed 1 times within 1 cycle(s) then alert
> > Ppid = if changed 1 times within 1 cycle(s) then alert
> >
> >
> > Regards,
> > navanee
> >
>
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>