monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NFS is going down, et al [was: pidfiles aka. Re: [CVS] unix socket s


From: Jan-Henrik Haukeland
Subject: Re: NFS is going down, et al [was: pidfiles aka. Re: [CVS] unix socket support added]
Date: 06 Aug 2002 01:42:01 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Civil Service)

Christian Hopp <address@hidden> writes:

> If neccessary I can recover it from ~/experience. (-:  

Great!

> For me the topic is over... and for you?  Lets face some more
> important stuff!

No problem and absolutely.


> Let me cite "man mount" on this:

>        The program accessing a file on a NFS mounted file system
>        will hang when the  server crashes. The process cannot be
>        interrupted or killed unless you also specify intr. 

Yes and I belive that alarm is exactly such an interrupt. If a process
receives the alarm signal it _has_ to act on it. The default behavior
is to terminate the process unless an alarm handler was installed. So
alarm (2) should not have any problems jolting monit out off a file
read block. 

> We have unfortunately a very unreliable network right now.

You're in an excellent position to test this then :) I'm beting you a
bottle of beer that alarm will work.

> The thing is... monit has to run... even if services monit is checking
> are running berserk.  I thinks that's what a monitor should do.

Indeed!

> If we are in "what if" discussions here are some other things to think
> about.
> 
> * Monit checks a server which defuncs aka. is a zombie.  Is it in
>   "good health" or not?  Pidfile and Pid do match.  I don't know what
>   its ports do (do they still connect or not?).

They should not accept a connection, but I'm not quite sure since the
kernel handles socket connection and deliver them to the process.
Anyway, a zombi process, even if it accept a connection should not
pass the default connection test (the one with select) and of course
not any protocol test. But it's still a valid questions, especially
for daemons without network code, like crond. This could be solved if
I ever get around to hack the process status code I was planning to do
(see item 6. in the next release plan). Maybe you would like to give
it a stab?

> * A start/stop script returns with error, should monit still try to
>   (re)start/stop the process?

Good one, I thought about this someday when I was looking at the code.
At least an alert message should be sent if monit cannot start the
process. Now, only a log entry is made. 

-- 
Jan-Henrik Haukeland



reply via email to

[Prev in Thread] Current Thread [Next in Thread]