[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: if x restarts within y cycles then exec "script"
From: |
Martin Pala |
Subject: |
Re: if x restarts within y cycles then exec "script" |
Date: |
Tue, 20 Feb 2007 17:31:32 +0100 |
User-agent: |
Thunderbird 1.5.0.9 (Windows/20061207) |
Javi Roman wrote:
I've tested the patch timeout_to_exec.patch and it worked fine. I
think it's would be a good Monit improvement.
It can be added in the future (thanks to Alec for patch) ... it is
needed to check the implementation regarding the event handler integration.
Nevertheless I would
like if it's possible to do something similar to:
if 3 restarts within 6 cycles then exec "/sbin/reboot"
with the current Monit version.
There are several ways, for example you can modify the start script
(used in "start program ..." statement) to audit the number of
consecutive restarts and if it reaches the given ratio then perform the
reboot using this script.
Another possibility is to just touch some state file from the start
script and use the timestamp test as described in FAQ question no. 13.
You can then check the timestamp for example this way:
check file reboot_trigger with path /tmp/restart_flag
if timestamp < 10 seconds for 6 cycles then exec "/sbin/reboot"
When the start script is executed, it touches the /tmp/restart_flag =>
its timestamp is updated. Monit watches the timestamp and in the case
that it is updated 6 times within 6 cycles then it execs reboot. When
the start script succeeded, the timestamp will be updated just once and
will become older then 10 seconds => the timestamp test won't match
(note that the timestamp value depends on your monit cycle length - for
example in the case that monit poll cycles is 5s, then timestamp of 10s
should be fine.
Related FAQ excerpt:
13. Q: Is here any support for external testing scripts available?
A: We plan to add the support for external scripts in the future
(see our
TODO list - http://www.tildeslash.com/monit/doc/next.php#33). Until
native support will be available, here are some workarounds:
1.) nice workaround contributed by Pavel Urban is based on timestamp
monitoring of file, which is updated by external script, running
from
cron. When everything is OK, the script will update (touch) the
file.
When the state is false, the script won't update the timestamp and
monit will perform the related action.
For example script for monitoring the count of files inside /tmp
directory:
--8<--
#!/bin/bash
if [ `ls -1 /tmp |wc -l` -lt 100 ]
then
touch /var/tmp/monit_flag_tmp
fi
--8<--
run this script via cron (for example, every 20 minutes):
--8<--
20 * * * * /root/test_tmp_files > /dev/null 2>&1
--8<--
and do timestamp check on /var/tmp/monit_flag_tmp (or any file
you decide)
in monit control file:
--8<--
check file monit_flag_tmp with path /var/tmp/monit_flag_tmp
if timestamp > 25 minutes then alert
--8<--
Done :)
Another Example script: for monitoring the Solaris Volume Manager
metadevices:
--8<--
#!/usr/bin/bash
/usr/sbin/metastat | /usr/xpg4/bin/grep -q maintenance
if [ $? -ne 0 ]; then
touch /var/tmp/monit_flag_svm
fi
--8<--
2.) alternatively you can use the monit's file content testing
to watch
logfiles or status files created similar way as described above.
Example script:
--8<--
#!/usr/bin/bash
/usr/sbin/metastat > /var/tmp/monit_svm
--8<--
and example monit syntax:
--8<--
check file svm with path /var/tmp/monit_svm
if match "maintenance" then alert
--8<--
Martin