monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit doesn't poll all services at once?


From: Joe S.
Subject: Re: monit doesn't poll all services at once?
Date: Sun, 9 Oct 2005 07:07:50 -0500

Alright here is an example:

check host nashville.servershost.net with address nashville.servershost.net
if failed icmp type echo with timeout 15 seconds
then alert
if failed port 80 proto http with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 80 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 80 nashville.servershost.net"
if failed port 22 proto ssh with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 22 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 22 nashville.servershost.net"
if failed port 143 proto imap with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 143 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 143 nashville.servershost.net"
if failed port 110 proto pop with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 110 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 110 nashville.servershost.net"
if failed port 3306 proto mysql with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 3306 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 3306 nashville.servershost.net"
if failed port 2082 with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 2082 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 2082 nashville.servershost.net"
if failed port 2087 with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 2087 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 2087 nashville.servershost.net"
if failed port 21 protocol FTP with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 21 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 21 nashville.servershost.net"
if failed port 25 proto SMTP with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 25 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 21 nashville.servershost.net"


Mostly all it does is goes down a list with the above, but uhm 50 of the above, all just through network services as you said.

I have it execute a command to another script so it can send me a different type of alert, and work on restarting it differently remotely.

Thanks


----- Original Message ----- From: "Jan-Henrik Haukeland" <address@hidden>
To: "This is the general mailing list for monit" <address@hidden>
Sent: Sunday, October 09, 2005 6:07 AM
Subject: Re: monit doesn't poll all services at once?



On 9. okt. 2005, at 05.56, Joe S. wrote:

I am wondering if its possible to monitor all services at once, rather than one by one, I like to monitor all services remotely, but when it goes down the list, it eventually takes around 30 minutes to check if all services are online before it goes back to #1 to check.


Oj, that sounds much. Is it possible for you to post your monitrc file here so we can have a closer look.

is it possible for it instead going down the list, but to poll everything, all at once, and continue from there?

I know it will eat alot of resources, but its needed to speed things up a bit.

It can be a solution to fire of each monitoring in it's own thread. But this may not even solve the problem in the long run. The stuff that (probably) take time is network monitoring since monit has a default 5 sec timeout on each connect, read and write. A better solution than using threads may be to use level triggered event notification (see e.g. man epoll on Linux and kqueue on BSD and http://www.monkey.org/~provos/libevent/ for a general library). We could also fork of a monitor process per service monitored like apache (1.x) does.

I do feel a chill going down my spine for the extra complexity any of these solutions will add :) But maybe we simply *must* rewrite monit to scale better if more users are experience the same problem. (I only monitor around 10 services and have never had this problem.)

--
Jan-Henrik Haukeland
Mobil +47 97141255



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]