It does show up in the Events page-
- Date Mar 12
2016 00:13:22
- Host db1-primary
- Service name backup_failure
- Service type Program
- Event Status
failed
- Action Alert
Message
- '/usr/local/bin/check_backup' failed with
exit status (0) -- no output
And the alert rule it falls under (can't select the text) -
hostgroup production
any service
failed
any event
then perform the following actions
execute program
(script for pagerduty)
This is the same ruleset used for all of the production services I
monitor - and they all work except this.
On 3/12/2016 9:46 AM, Martin Pala
wrote:
Please check if M/Monit's "Report -> Events"
page if it contains the related event. If it does, then the
the event was sent to M/Monit and was processed by M/Monit's
Rule manager, but didn't match any rule ... please check the
"Admin -> Alerts" page in such case.
They are managed in m/monit -
address@hidden: ~ # cat
/etc/monit/monitrc
set daemon 300
with start delay 10
set logfile syslog facility
log_daemon
set idfile /var/.monit.id
set statefile /var/.monit.state
set mailserver localhost
include /etc/monit/conf.d/*
set eventqueue basedir
/var/spool/monit slots 1000
set mmonit https://monit:58Mnz22*jyNSO$Q&@fake.example.net:8443/collector
set httpd port 2812
allow localhost
allow fake.example.net
allow monit:XXXXXXXX
use address 10.124.74.115
allow 10.124.74.115
On 3/12/2016 8:38 AM,
Martin Pala wrote:
Are the alerts on your system managed on
Monit side or in M/Monit?
Best regards,
Martin
I'm stumped. I
have an ugly little script to alert me if
today's backup of a database is smaller than
the one from yesterday (and the day before).
The script works properly, and I have a
simple monit rule in place to alert me if it
fails. When monit checks, it reports a
failure; that is pushed up to my m/monit
server, which also logs the failure. From
there, all alerts go to PagerDuty. But I
never get alerts from this check.
(Hopefully) all relevant output is below.
Some strings have been obfuscated. Note that
I have the rule modified to falsely report a
failure, for testing.
address@hidden:
/etc/monit/conf.d # cat /etc/debian_version
7.9
address@hidden:
/etc/monit/conf.d # monit --version
This is Monit version 5.17
Built with ssl, without pam
and with large files
Copyright (C) 2001-2016
Tildeslash Ltd. All Rights Reserved.
address@hidden:
/etc/monit/conf.d # cat backups
check program backup_failure
with path /usr/local/bin/check_backup with
timeout 15 seconds
not every "* 14 * * *"
#if status != 0 then alert
if status != 1 then alert
address@hidden:
/etc/monit/conf.d # cat
/usr/local/bin/check_backup
#!/bin/bash
BACKUP_DIR=/var/backups
cd ${BACKUP_DIR}
BUFILE=`date
+%Y_%m_%d`_"group".sql.gz
YDAY_BUFILE=`date --date "1
days ago" +%Y_%m_%d`_"group".sql.gz
DAYBEFORE_YDAY_BUFILE=`date
--date "2 days ago"
+%Y_%m_%d`_"group".sql.gz
if [ -e "${BUFILE}" ];then
TDAYSIZE=`du
${BUFILE}|cut -f1`
YDAYSIZE=`du
${YDAY_BUFILE}|cut -f1`
DBDAYSIZE=`du
${DAYBEFORE_YDAY_BUFILE}|cut -f1`
if [ $YDAYSIZE -gt
$DBDAYSIZE ];then
if [ $TDAYSIZE -gt
$YDAYSIZE ];then
exit 0
fi
else
exit 1
fi
fi
address@hidden:/etc/monit/conf.d
# tail -1
/var/log/daemon.log
Mar 11 15:25:04 localhost
monit[10562]: 'backup_failure'
'/usr/local/bin/check_backup' failed with
exit status (0) -- no output
address@hidden: ~ #
monit status|tail -7
Program 'backup_failure'
status Status
failed
monitoring
status Monitored
last
started Fri, 11 Mar
2016 15:42:36
last exit
value 0
data
collected Fri, 11 Mar
2016 15:42:36
What am I missing?
--
Paul Theodoropoulos
www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
Paul Theodoropoulos
www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
Paul Theodoropoulos
www.anastrophe.com