monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[monit] monit 4.10.1 is driving me crazy!


From: David Paper
Subject: [monit] monit 4.10.1 is driving me crazy!
Date: Fri, 23 Jan 2009 18:22:21 -0500


Hi monit gurus,

I'm absolutely stumped, and have been stumped for more than a month trying to chase a problem down. I'm using Monit 4.10.1 on OpenSuse 11.0 64-bit.

Monit SOMETIMES starts multiple copies of the same job. Not always, not never, SOMETIMES.

Monit can read the PID file for the job, the PID is defined, written out to the file, permissions are correct, ownership is correct, and the PID file contains a PID of one of the multiple executions of the same job.

The job in question is the tm_prod03catalogedge01 job (see -v output before for more specifics). The start/stop commands call help scripts that do the heavy lifting. The "sleep 30" at the end of the script is an attempt to slow monit down so it doesn't try to start multiple instances of the same job. It doesn't work. When multiple copies of the same job are started, there is a NOT a 30 second delay when looking at ps and viewing the start times.

Has anyone else run into a bug where Monit very quickly starts multiple instances of the same job? I'm seeing this on dozens of different hosts at different times, so it's not isolated to a single monit instance or a single job definition. The only thing that is in common is that all of the jobs are Jboss servers.

I've been anxiously watching the Monit 5.0 beta's, hoping it gets released as a final soon. These are production servers, and I'd rather not run beta code if at all possible. However, I will if this is a known bug that has been fixed, and I just couldn't match this problem up to the entries in the Changelog.

--

monit_run.sh:

#!/bin/ksh
DATE=`date +%Y%m%d-%H%M%S`
CONSOLE_LOG=/opt/jboss/server/${4}/log/console.log
if [ -a ${CONSOLE_LOG} ]; then
        mv ${CONSOLE_LOG} ${CONSOLE_LOG}-${DATE}
fi

logger "Running /opt/jboss/bin/run.sh for ${2}"
cd /opt/jboss/bin; ./${4} $* | tee ${CONSOLE_LOG}

#sticking in a sleep to try to get monit to stop spawing multiple procs
sleep 30

--

monitrc:

set daemon  20
set logfile syslog facility log_daemon
set mailserver localhost               # primary mailserver
set eventqueue
basedir /opt/monit/eventqueue # set the base directory where events will be stored
set mail-format { Subject: monit alert for $HOST -- $EVENT $SERVICE }
set alert address@hidden                 # receive all alerts
set httpd port 2812 and
     use address localhost  # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
include /opt/monit/jobs/*
check system localhost
        noalert address@hidden

--

monit -v output:

[dpaper]:[18:07:48]:/opt/jboss/bin> sudo monit -v
monit: Debug: Adding host allow 'localhost'
monit: Debug: Skipping redundant host 'localhost'
monit: Debug: Skipping redundant host 'localhost'
monit: Debug: Skipping redundant host 'localhost'
monit: Debug: Skipping redundant host 'localhost'
monit: Debug: Skipping redundant host 'localhost'
Runtime constants:
 Control file       = /opt/monit/etc/monitrc
 Log file           = syslog
 Pid file           = /var/run/monit.pid
 Debug              = True
 Log                = True
 Use syslog         = True
 Is Daemon          = True
 Use process engine = True
 Poll time          = 20 seconds
Event queue = base directory /opt/monit/eventqueue with unlimited slots
 Mail server(s)     = localhost:25
 Mail from          = address@hidden
 Mail subject       = monit alert for $HOST -- $EVENT $SERVICE
 Mail message       = $EVENT Service $SERV..(truncated)
 Start monit httpd  = True
 httpd bind address = localhost
 httpd portnumber   = 2812
 httpd signature    = True
 Use ssl encryption = False
 httpd auth. style  = Host/Net allow list
 Alert mail to      = address@hidden
   Alert on         = All events

The service list contains the following entries:

Process Name          = tm_prod03catalogedge01
 Pid file             = /var/run/jboss/tm_prod03catalogedge01.pid
 Monitoring mode      = active
Start program = '/opt/jboss/bin/monit_run.sh -b prod03catalogedge01.dc03.totalmusic.net -c prod03catalogedge01' as uid 8002 as gid 8002 timeout 1 cycle(s) Stop program = '/bin/bash -c /opt/jboss/bin/monit_stop.sh prod03catalogedge01.dc03.totalmusic.net > /tmp/stop.log 2>&1' as uid 8002 as gid 8002 timeout 1 cycle(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
Port = if failed prod03catalogedge01.dc03.totalmusic.net:8080 [DEFAULT via TCP] with timeout 5 seconds 5 times within 10 cycle(s) then alert else if passed 1 times within 1 cycle(s) then alert

System Name           = localhost
 Monitoring mode      = active
 Alert mail to        = address@hidden
   Alert on           = No events

-------------------------------------------------------------------------------
monit daemon at 1850 awakened

--

Thanks!

-dave

--
Dave Paper                          address@hidden

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]