monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [monit] monit 4.10.1 is driving me crazy!


From: Gilad Benjamini
Subject: RE: [monit] monit 4.10.1 is driving me crazy!
Date: Fri, 23 Jan 2009 15:30:02 -0800

I ran into similar problems.
Several attempts to work around them didn't work.
See some posting I made recently.
The recommended solution in that thread was an upgrade to 5.0beta6. 

> -----Original Message-----
> From: address@hidden
> [mailto:address@hidden On
> Behalf Of David Paper
> Sent: Friday, January 23, 2009 3:22 PM
> To: This is the general mailing list for monit
> Subject: [monit] monit 4.10.1 is driving me crazy!
> 
> 
> Hi monit gurus,
> 
> I'm absolutely stumped, and have been stumped for more than a month
> trying to chase a problem down.  I'm using Monit 4.10.1 on OpenSuse
> 11.0 64-bit.
> 
> Monit SOMETIMES starts multiple copies of the same job.  Not always,
> not never, SOMETIMES.
> 
> Monit can read the PID file for the job, the PID is defined, written
> out to the file, permissions are correct, ownership is correct, and
> the PID file contains a PID of one of the multiple executions of the
> same job.
> 
> The job in question is the tm_prod03catalogedge01 job (see -v output
> before for more specifics).  The start/stop commands call help scripts
> that do the heavy lifting.  The "sleep 30" at the end of the script is
> an attempt to slow monit down so it doesn't try to start multiple
> instances of the same job.  It doesn't work.  When multiple copies of
> the same job are started, there is a NOT a 30 second delay when
> looking at ps and viewing the start times.
> 
> Has anyone else run into a bug where Monit very quickly starts
> multiple instances of the same job?   I'm seeing this on dozens of
> different hosts at different times, so it's not isolated to a single
> monit instance or a single job definition.  The only thing that is in
> common is that all of the jobs are Jboss servers.
> 
> I've been anxiously watching the Monit 5.0 beta's, hoping it gets
> released as a final soon.  These are production servers, and I'd
> rather not run beta code if at all possible.  However, I will if this
> is a known bug that has been fixed, and I just couldn't match this
> problem up to the entries in the Changelog.
> 
> --
> 
> monit_run.sh:
> 
> #!/bin/ksh
> DATE=`date +%Y%m%d-%H%M%S`
> CONSOLE_LOG=/opt/jboss/server/${4}/log/console.log
> if [ -a ${CONSOLE_LOG} ]; then
>       mv ${CONSOLE_LOG} ${CONSOLE_LOG}-${DATE}
> fi
> 
> logger "Running /opt/jboss/bin/run.sh for ${2}"
> cd /opt/jboss/bin; ./${4} $* | tee ${CONSOLE_LOG}
> 
> #sticking in a sleep to try to get monit to stop spawing multiple procs
> sleep 30
> 
> --
> 
> monitrc:
> 
> set daemon  20
> set logfile syslog facility log_daemon
> set mailserver localhost               # primary mailserver
> set eventqueue
>       basedir /opt/monit/eventqueue  # set the base directory where
> events will be stored
> set mail-format { Subject: monit alert for $HOST -- $EVENT $SERVICE }
> set alert address@hidden                 # receive all alerts
> set httpd port 2812 and
>       use address localhost  # only accept connection from localhost
>       allow localhost        # allow localhost to connect to the
> server and
> include /opt/monit/jobs/*
> check system localhost
>       noalert address@hidden
> 
> --
> 
> monit -v output:
> 
> [dpaper]:[18:07:48]:/opt/jboss/bin> sudo monit -v
> monit: Debug: Adding host allow 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> monit: Debug: Skipping redundant host 'localhost'
> Runtime constants:
>   Control file       = /opt/monit/etc/monitrc
>   Log file           = syslog
>   Pid file           = /var/run/monit.pid
>   Debug              = True
>   Log                = True
>   Use syslog         = True
>   Is Daemon          = True
>   Use process engine = True
>   Poll time          = 20 seconds
>   Event queue        = base directory /opt/monit/eventqueue with
> unlimited slots
>   Mail server(s)     = localhost:25
>   Mail from          = address@hidden
>   Mail subject       = monit alert for $HOST -- $EVENT $SERVICE
>   Mail message       = $EVENT Service $SERV..(truncated)
>   Start monit httpd  = True
>   httpd bind address = localhost
>   httpd portnumber   = 2812
>   httpd signature    = True
>   Use ssl encryption = False
>   httpd auth. style  = Host/Net allow list
>   Alert mail to      = address@hidden
>     Alert on         = All events
> 
> The service list contains the following entries:
> 
> Process Name          = tm_prod03catalogedge01
>   Pid file             = /var/run/jboss/tm_prod03catalogedge01.pid
>   Monitoring mode      = active
>   Start program        = '/opt/jboss/bin/monit_run.sh -b
> prod03catalogedge01.dc03.totalmusic.net -c prod03catalogedge01' as uid
> 8002 as gid 8002 timeout 1 cycle(s)
>   Stop program         = '/bin/bash -c /opt/jboss/bin/monit_stop.sh
> prod03catalogedge01.dc03.totalmusic.net > /tmp/stop.log 2>&1' as uid
> 8002 as gid 8002 timeout 1 cycle(s)
>   Pid                  = if changed 1 times within 1 cycle(s) then
> alert
>   Ppid                 = if changed 1 times within 1 cycle(s) then
> alert
>   Port                 = if failed
> prod03catalogedge01.dc03.totalmusic.net:8080 [DEFAULT via TCP] with
> timeout 5 seconds 5 times within 10 cycle(s) then alert else if passed
> 1 times within 1 cycle(s) then alert
> 
> System Name           = localhost
>   Monitoring mode      = active
>   Alert mail to        = address@hidden
>     Alert on           = No events
> 
> -----------------------------------------------------------------------
> --------
> monit daemon at 1850 awakened
> 
> --
> 
> Thanks!
> 
> -dave
> 
> --
> Dave Paper                          address@hidden
> 
> MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
> 
> 
> 
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]