[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: monit doesn't poll all services at once?
From: |
Joe S. |
Subject: |
Re: monit doesn't poll all services at once? |
Date: |
Sun, 9 Oct 2005 07:07:50 -0500 |
Alright here is an example:
check host nashville.servershost.net with address nashville.servershost.net
if failed icmp type echo with timeout 15 seconds
then alert
if failed port 80 proto http with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 80 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 80
nashville.servershost.net"
if failed port 22 proto ssh with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 22 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 22
nashville.servershost.net"
if failed port 143 proto imap with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 143 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 143
nashville.servershost.net"
if failed port 110 proto pop with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 110 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 110
nashville.servershost.net"
if failed port 3306 proto mysql with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 3306 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 3306
nashville.servershost.net"
if failed port 2082 with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 2082 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 2082
nashville.servershost.net"
if failed port 2087 with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 2087 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 2087
nashville.servershost.net"
if failed port 21 protocol FTP with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 21 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 21
nashville.servershost.net"
if failed port 25 proto SMTP with timeout 15 seconds
then exec "/bin/sh /etc/restart.sh 25 nashville.servershost.net"
else if recovered then exec "/bin/sh /etc/recover.sh 21
nashville.servershost.net"
Mostly all it does is goes down a list with the above, but uhm 50 of the
above, all just through network services as you said.
I have it execute a command to another script so it can send me a different
type of alert, and work on restarting it differently remotely.
Thanks
----- Original Message -----
From: "Jan-Henrik Haukeland" <address@hidden>
To: "This is the general mailing list for monit" <address@hidden>
Sent: Sunday, October 09, 2005 6:07 AM
Subject: Re: monit doesn't poll all services at once?
On 9. okt. 2005, at 05.56, Joe S. wrote:
I am wondering if its possible to monitor all services at once, rather
than one by one, I like to monitor all services remotely, but when it
goes down the list, it eventually takes around 30 minutes to check if
all services are online before it goes back to #1 to check.
Oj, that sounds much. Is it possible for you to post your monitrc file
here so we can have a closer look.
is it possible for it instead going down the list, but to poll
everything, all at once, and continue from there?
I know it will eat alot of resources, but its needed to speed things up
a bit.
It can be a solution to fire of each monitoring in it's own thread. But
this may not even solve the problem in the long run. The stuff that
(probably) take time is network monitoring since monit has a default 5
sec timeout on each connect, read and write. A better solution than using
threads may be to use level triggered event notification (see e.g. man
epoll on Linux and kqueue on BSD and
http://www.monkey.org/~provos/libevent/ for a general library). We could
also fork of a monitor process per service monitored like apache (1.x)
does.
I do feel a chill going down my spine for the extra complexity any of
these solutions will add :) But maybe we simply *must* rewrite monit to
scale better if more users are experience the same problem. (I only
monitor around 10 services and have never had this problem.)
--
Jan-Henrik Haukeland
Mobil +47 97141255
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general