monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PID file being removed


From: Martin Pala
Subject: Re: PID file being removed
Date: Tue, 09 Sep 2003 19:30:14 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Hi,

it seems it was bug in monit. In your case getpgid call used for detection whether the process is running or not returns zero for mysql and for other monitored processes (such as ssh) non zero:

...
22017/1:        open("/var/run/sshd.pid", O_RDONLY)             = 5
22017/1:        read(5, " 4 1 2\n", 8192)                       = 4
22017/1:        close(5)
22017/1:        getpgid(412)                                    = 412
...
22017/1:        open("/ams/db/mysql/var/loki.pid", O_RDONLY)    = 5
22017/1:        read(5, " 4 0 1", 8192)                         = 3
22017/1:        close(5)                                        = 0
22017/1:        getpgid(401)                                    = 0
...

Monit understands zero as the process is not running, which is bad presumption, because if the process is not running, getpgid returns -1 and sets errno to ESRCH.

You can use attached patch for monit-3.2, i fixed it in current CVS version too. Current monit-4.0-beta2 is affected by this bug too, monit-4.0-beta3 or stable 4.0 release will be fine.

Thanks for helping with this bug solution :)

Martin


Shannon E. Reall wrote:

Yes, I am running Solaris 8. I ran the truss command and have attached the output. I started getting inconsistent behavior this morning. On the machine that I have been doing all the testing on, I can not consistently duplicate the problem any longer. It only happens sporadically. I tried it on a different machine and do see the same problem consistently. This time the pid file does not get removed but monit still does not recognize it is running and tries to restart it. There is a different rc script on this machine so you were correct in saying that the start-up script is removing the pid file, so now the question is why monit does not recognize the process is running. I am running monit as root. Here is some additional information:

# Monitor mysql
check mysql with pidfile /ams/db/mysql/var/loki.pid
timeout(3,3)
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
alert address@hidden on { timeout }
alert address@hidden on { restart } #may change to unix-info eventually

address@hidden:/# ls -l /ams/db/mysql/var/loki.pid
-rw-rw---- 1 mysql mysql 3 Sep 4 06:17 /ams/db/mysql/var/loki.pid

Also, interesting is that I have now encountered the same type situation with postgresql. Monit tries to restart postgres although it is already running. I will look at monit 4.0 and let you know how I make out.

Thanks for your help.

Shannon

address@hidden wrote:

------------------------------

Message: 2
Date: Mon, 08 Sep 2003 06:39:19 -0400
From: "Shannon E. Reall" <address@hidden>
Subject: Re: PID file being removed
To: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset="us-ascii"

Here is the rc script for mysql:

#!/sbin/sh
#

case "$1" in
'start')
su mysql -c "exec /usr/local/mysql/libexec/mysqld" &
;;

'stop')
ps -ef | grep mysqld | grep -v grep | awk '{print $2}' | xargs kill -9
;;

*)
echo "Usage: $0 { start | stop }"
exit 1
;;
esac


**The pid file is definately there after mysql is started.

Here is the output from running monit in verbose mode:

Runtime constants:
Control file = /etc/monitrc
Log file = /var/log/monit
Pid file = /var/run/monit.pid
Debug = True
Log = True
Use syslog = False
Is Daemon = False
Use process engine = True
Poll time = 0 seconds
Mail server = (not defined)
Mail from = (not defined)
Mail subject = (not defined)
Mail message = (not defined)
Start monit httpd = False

The process list contains the following entries:

Process Name = mysql
Group = (not defined)
Pid file = /usr/local/mysql/var/sun07.pid
Monitoring mode = active
Start program = /etc/init.d/mysql
Stop program = /etc/init.d/mysql
Host:Port = (not defined)
Resource Limits = (not defined)
Every = (not defined)
Timeout = Do timeout if 3 restart within 3 cycles
Alert mail to = address@hidden
alert from = (default)
alert subject = (default)
alert message = (default)
alert on timeout = no
alert on restart = yes
alert on checksum = no
alert on resource = no
alert on stop = no
alert on timestamp = no
Alert mail to = address@hidden
alert from = (default)
alert subject = (default)
alert message = (default)
alert on timeout = yes
alert on restart = no
alert on checksum = no
alert on resource = no
alert on stop = no
alert on timestamp = no

-------------------------------------------------------------------------------
start: (mysql) /etc/init.d/mysql
monit: Warning process 'mysql' was not started
Monitoring enabled -- process mysql
Restart notification is sent to address@hidden

Thanks,
Shannon

address@hidden wrote:

------------------------------

Message: 3
Date: Thu, 28 Aug 2003 13:12:41 +0200
From: Martin Pala <address@hidden>
Subject: Re: PID file being removed
To: This is the general mailing list for monit
        <address@hidden>
Message-ID: <address@hidden>
Content-Type: text/plain; charset=ISO-8859-2; format=flowed

Hi,

the problem is probably caused in your mysql startup script, which probably removes it. Monit is not able to remove the monitored service's pidfile (until instructed to do so via exec statement which is not this case).

Please:

1.) attach your mysql rc script - we can figure out where and why it removed the pidfile.

2.) optionaly trace the process as described in FAQ.txt distributed with monit, it could help to see the actions/environment which preceded to start method execution and the cause why monit decided to start mysql though it was running before monit started (which is not normal - there must be some reason for it - it works in 3.2 well).

3.) run monit in verbose mode ('-v' command line option)


If it is problem for you to do any of above hints, please send kindly just some of these informations.


Thanks for feedback :)
Martin


Message: 2
Date: Fri, 22 Aug 2003 07:49:19 -0400
From: "Shannon E. Reall" <address@hidden>
Subject: PID file being removed
To: address@hidden
Message-ID: <address@hidden>
Content-Type: text/plain; charset=us-ascii; format=flowed

I recently upgraded to 3.2 and am now having a problem monitoring mysqld. Here is that portion of the conf file:

check mysql with pidfile /usr/local/mysql/var/sun07.pid
timeout(3,3)
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
alert address@hidden on { timeout }
alert address@hidden on { restart }

The pid file exists before I start monit but then it disappears. Here is what I did for testing:

address@hidden:/# ps -ef |grep mysql
mysql 455 1 0 18:05:55 ? 0:01 /usr/local/mysql/libexec/mysqld
  root 23227 19786  0 07:47:11 pts/3    0:00 grep mysql

address@hidden:/# echo 455 > /usr/local/mysql/var/sun07.pid

address@hidden:/# ls -l /usr/local/mysql/var/sun07.pid
-rw-r--r-- 1 root other 4 Aug 22 07:47 /usr/local/mysql/var/sun07.pid

address@hidden:/# /usr/local/bin/monit -d 60
Starting monit daemon

address@hidden:/# tail /var/log/monit
[EDT Aug 22 07:37:20] Starting monit daemon
[EDT Aug 22 07:39:44] Stopping monit HTTP server
[EDT Aug 22 07:39:44] monit daemon with pid [22123] killed
[EDT Aug 22 07:43:28] Starting monit daemon
[EDT Aug 22 07:43:28] start: (mysql) /etc/init.d/mysql
[EDT Aug 22 07:44:28] monit: Warning process 'mysql' was not started
[EDT Aug 22 07:44:29] Stopping monit HTTP server
[EDT Aug 22 07:44:29] monit daemon with pid [23002] killed
[EDT Aug 22 07:47:47] Starting monit daemon
[EDT Aug 22 07:47:47] start: (mysql) /etc/init.d/mysql
[EDT Aug 22 07:48:47] monit: Warning process 'mysql' was not started

address@hidden:/# ls -l /usr/local/mysql/var/sun07.pid
/usr/local/mysql/var/sun07.pid: No such file or directory

Am I missing something?  Thanks for any help you can provide.

--
To unsubscribe:
http://mail.nongnu.org/mailman/listinfo/monit-general


diff -Naur monit-3.2/util.c monit-3.2-mp/util.c
--- monit-3.2/util.c    2003-02-17 12:42:28.000000000 +0100
+++ monit-3.2-mp/util.c 2003-09-09 19:20:13.000000000 +0200
@@ -706,7 +706,7 @@
 
   if((pid= get_pid(p->pidfile))) {
     
-    if((kill_return= getpgid(pid)) > 0 || errno == EPERM)
+    if((kill_return= getpgid(pid)) > -1 || errno == EPERM)
        
       return pid;
     

reply via email to

[Prev in Thread] Current Thread [Next in Thread]