monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Monit not able to start after box reboot


From: Fei, Yuming
Subject: RE: Monit not able to start after box reboot
Date: Tue, 12 Jun 2012 09:25:14 -0500

Right, it should be very impossible for it to happen in practice. We probably are not lucky enough to see it happen at system reboot, but putting the pidfile in a non-persistent area across reboot will help solve the problem there. Thanks,

Yuming

 


From: address@hidden [mailto:address@hidden On Behalf Of Martin Pala
Sent: Tuesday, June 12, 2012 12:53 AM
To: This is the general mailing list for monit
Subject: Re: Monit not able to start after box reboot

 

Yes that could happen in theory - i have never seen it in practice, as on normal platform the PID is not recycled immediately, but the PID is assigned from incremental sequence. The PID gets reused in the future when the sequence wraps.

 

It seems that your platform is aggressive on PID recycling - what platform and kernel it is?

 

The solution for the services check on platform that reuses the PIDs so quickly is to use the pattern based process check which doesn't depend on the pidfile ("check process myproc matching …")

 

Regards,

Martin

 

 

On Jun 12, 2012, at 5:56 AM, Fei, Yuming wrote:



The same problem exists with the service pid files too. Further more, it seems that the use of pidfiles in below, as shown in many examples, does not really work:

check process ntpd with pidfile /var/run/ntpd.pid

   start program = "/etc/init.d/ntpd start"

   stop  program = "/etc/init.d/ntpd stop"

 

Because there is always a chance that the process dies, and another process starts and obtains the same pid before monit detects it. Monit may send out an alert if the ppid changes, but it will think that the process is running and won’t run the start program. This may happen at system reboot time but can also happen at run time, in which case, the tmpfs solution may not help.

 

If this is true, it may explain what I have also seen: after system reboot, monit does not always start the processes…

Maybe I am missing something here, but please let me know.

 

Yuming

 


From: Fei, Yuming 
Sent: Monday, June 11, 2012 4:59 PM
To: 'This is the general mailing list for monit'
Subject: RE: Monit not able to start after box reboot

 

Thanks Martin, placing the pidfile on tmpfs will help. So, is there a way to avoid this problem if placing the pidfile on disk, or currently it shouldn’t be a disk file since it is persistent across reboot?

Yuming

 


From: address@hidden [mailto:monit-general-bounces+yfei=address@hidden] On Behalf Of Martin Pala
Sent: Monday, June 11, 2012 4:32 PM
To: This is the general mailing list for monit
Subject: Re: Monit not able to start after box reboot

 

If you run the monit binary, it checks whether the daemon is running already by reading the PID from its pidfile and looking for the given process. It seems that after your system rebooted, some other process obtained the same PID, so monit thinks that it is running and doesn't daemonize itself.

 

The solution could be to place the pidfile on tmpfs (memory based filesystem), which is not persistent across reboot => the file will disappear when you reboot the system.

 

The pidfile location can be set with the "set pidfile" statement.

 

Regards,

Martin

 

 

 

On Jun 11, 2012, at 11:24 PM, Fei, Yuming wrote:

 

Hi all,

I have seen a problem that monit is not able to start after the box reboot. The monit is run from init as daemon.

This problem happens occasionally. When it does happen, I saw these:

 

First of all, the .monit.pid file was not removed after the box shutdown.

Then monit was started from init, but it wrote “monit daemon at 12327 awakened” into its log, where 12327 is the pid in .monit.pid. The monit startup process then exited and no monit process run.

 

Cleaning up the .monit.pid file helped: restart monit after the cleaning up, then monit came up.

 

Anyone has experienced this? What could cause this to happen?

 

Looking at monit’s source code, I found these:

 

(1) the removal function of the pid file is registered in atexit(), however it may not be called if  the process terminates abnormally. Thus the pid file may not be removed, which is probably what was seen above.

(2) When monit starts up, it retrieves the pid value from the pid file, and calls getpgid to check the result. But if the process with that pid is running as a zombile, the getpgid checking will pass and monit thinks that the monit daemon is running, and will “awake” that daemon which is actually a zombie. The result is that monit daemon won’t come up.

 

It doesn’t seem that monit tests if there is a monit zombie process during startup.

Also, if by any chance there is a process running with the same pid in the .monit.pid, monit will send a signal to it to “awake” it …, and then it may kill that process.

 

This is seen in monit 5.1.1, and seems to be in the latest version 5.4 as well.

 

Thanks

Yuming

 


CONFIDENTIALITY AND SECURITY NOTICE

The contents of this message and any attachments may be confidential and proprietary. If you are not an intended recipient, please inform the sender of the transmission error and delete this message immediately without reading, distributing or copying the contents.
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

 

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]