monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit error for simple file check?


From: Martin Pala
Subject: Re: monit error for simple file check?
Date: Tue, 12 Apr 2005 15:06:38 +0200
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

I replicated the problem - here is my  testing configuration:

--8<--
address@hidden cat /etc/monitrc.file
set daemon  5
set mailserver 127.0.0.1
set logfile /var/log/monit
set httpd port 2812 address 127.0.0.1 allow 127.0.0.1
set alert address@hidden

check process httpd with pidfile /var/run/httpd.pid
  start program = "/etc/init.d/httpd start"
  stop program = "/etc/init.d/httpd stop"
  depends on httpd_bin

check file httpd_bin with path /usr/sbin/httpd
--8<--

... and then:

  mv /usr/sbin/httpd /usr/sbin/httpd-orig && /etc/init.d/httpd stop


Now is the file monitoring skiped and it keeps its old state (doesn't enter validation cycle ... data collected timestamp is not changing):

--8<--
Process 'httpd'
  status                            Execution failed
  monitoring status                 monitored
  data collected                    Tue Apr 12 12:32:51 2005

File 'httpd_bin'
  status                            accessible
  monitoring status                 monitored
  permission                        555
  uid                               0
  gid                               2
  timestamp                         Tue Apr 12 12:23:46 2005
  size                              361012 B
  data collected                    Tue Apr 12 12:32:20 2005
--8<--


Other test scenarios worked well, it seems that exactly following conditions have to occure:

1.) the process start method depends on the file and the dependency is declared in monitrc (for example apache cannot start when the httpd binary is not present as in above example)

2.) when the "parent" file is removed AND the process killed BEFORE the monitoring cycle start (in the same timeframe between two cycle), the start method will fail (which is logical), however the monit skips further parent file monitoring (which is not good because it is needed to know that the file doesn't exist => root cause of start method malfunction).

Monit 4.4 is affected too (tested).

We will look on it ...


Martin




Claus Klein wrote:
Today, I have retested it with the newest debian monit version
and it seem, it works with V4.4.

So I think it is a bug in monit V4.5!

Claus Klein

address@hidden:/etc# rm /var/cache/SystemTestDone
address@hidden:/etc# monit status
...

Process 'ntpd'
  status                            initializing
  monitoring status                 initializing
  data collected                    Tue Apr 12 10:50:10 2005

File 'TestLockFile'
  status                            Does not exist
  monitoring status                 monitored
  data collected                    Tue Apr 12 10:50:12 2005

address@hidden:/etc# touch /var/cache/SystemTestDone
address@hidden:/etc# monit -V
This is monit version 4.4
Copyright (C) 2000-2004 by the monit project group. All Rights Reserved.
address@hidden:/etc#  monit status
...

Process 'ntpd'
  status                            running
  monitoring status                 monitored
  pid                               10061
  parent pid                        1
  uptime                            2m
  childrens                         0
  memory kilobytes                  3372
  memory kilobytes total            3372
  memory percent                    0.6%
  memory percent total              0.6%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                2.001s to localhost:123 [DEFAULT]
  data collected                    Tue Apr 12 10:53:40 2005

File 'TestLockFile'
  status                            accessible
  monitoring status                 monitored
  permission                        644
  uid                               0
  gid                               0
  timestamp                         Tue Apr 12 10:52:12 2005
  size                              0 B
  data collected                    Tue Apr 12 10:53:40 2005

address@hidden:/etc#
On Sunday 03 April 2005 12:19, Claus Klein wrote:

Hi,

with the following configuration I run into problems with monit.
It seems, that the file is only checked at startup?

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# tail 
/etc/monit/monitrc summary

check process ntpd with pidfile /var/run/ntpd.pid
  depends on TestLockFile
  group server
  start program  "/etc/init.d/ntp-server start"
  stop program  "/etc/init.d/ntp-server stop"
  if failed port 123 type udp then restart
  if 5 restarts within 5 cycles then timeout

check file TestLockFile with path /var/cache/SystemTestDone

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# uname -a

Linux www.clausklein.homelinux.net 2.6.8-2-686 #1 Mon Jan 24 03:58:38 EST 2005 
i686 GNU/Linux

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc summary
The monit daemon 4.5 uptime: 10m

System 'www.clausklein.homelinux.net'
 load average                      [0.01] [0.05] [0.07]
 cpu                               1.7%us 0.5%sy 0.3%wa
 memory usage                      199340 kB [38.6%]
 data collected                    Sun Apr  3 11:14:45 2005

Process  'apache      '   running         monitored       at Sun Apr  3 
11:14:50 2005
File     'apache_bin  '   accessible      monitored       at Sun Apr  3 
11:14:50 2005
Device   'homefs      '   Resource limit matched          monitored       at 
Sun Apr  3 11:14:50 2005
Device   'rootfs      '   accessible      monitored       at Sun Apr  3 
11:14:50 2005
Process  'snmpd       '   Execution failed        monitored       at Sun Apr  3 
11:14:50 2005
Process  'pure-ftpd   '   running         monitored       at Sun Apr  3 
11:14:50 2005
Process  'dictd       '   running         monitored       at Sun Apr  3 
11:14:45 2005
Process  'ntpd        '   Execution failed        monitored       at Sun Apr  3 
11:14:45 2005 # depend on 'TestLockFile'   !
Process  'wwwoffled   '   running         monitored       at Sun Apr  3 
11:14:45 2005
File     'wwwoffled_rc'   accessible      monitored       at Sun Apr  3 
11:14:45 2005
Process  'privoxy     '   running         monitored       at Sun Apr  3 
11:14:45 2005
File     'privoxy_bin '   accessible      monitored       at Sun Apr  3 
11:14:45 2005
File     'privoxy_rc  '   accessible      monitored       at Sun Apr  3 
11:14:45 2005
Process  'cron        '   running         monitored       at Sun Apr  3 
11:14:45 2005
File     'cron_rc     '   accessible      monitored       at Sun Apr  3 
11:14:45 2005
Process  'syslogd     '   running         monitored       at Sun Apr  3 
11:14:45 2005
File     'syslogd_file'   accessible      monitored       at Sun Apr  3 
11:14:45 2005
File     'TestLockFile'   accessible      monitored       at Sun Apr  3 11:10:35 2005 
<<<<<<<<<<< only tested ones? # I removed it after monit startup!
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5#
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc validate
'ntpd' process is not running
'ntpd' trying to restart
'ntpd' start: /etc/init.d/ntp-server
No lockfile found: /var/cache/SystemTestDone
'snmpd' process is not running
'snmpd' trying to restart
'snmpd' start: /etc/init.d/snmpd
Starting network management services: snmptrapd.
'homefs' space usage 94.0% matches resource limit [space usage>90.0%]
'ntpd' failed to start
'snmpd' failed to start
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc summary
The monit daemon 4.5 uptime: 15m

System 'www.clausklein.homelinux.net'
 load average                      [0.35] [0.13] [0.09]
 cpu                               3.9%us 0.9%sy 0.3%wa
 memory usage                      200604 kB [38.8%]
 data collected                    Sun Apr  3 11:18:46 2005

Process  'apache      '   running         monitored       at Sun Apr  3 
11:18:52 2005
File     'apache_bin  '   accessible      monitored       at Sun Apr  3 
11:18:52 2005
Device   'homefs      '   Resource limit matched          monitored       at 
Sun Apr  3 11:18:52 2005
Device   'rootfs      '   accessible      monitored       at Sun Apr  3 
11:18:52 2005
Process  'snmpd       '   Execution failed        monitored       at Sun Apr  3 
11:18:52 2005
Process  'pure-ftpd   '   running         monitored       at Sun Apr  3 
11:18:52 2005
Process  'dictd       '   running         monitored       at Sun Apr  3 
11:18:47 2005
Process  'ntpd        '   Execution failed        monitored       at Sun Apr  3 
11:18:47 2005
Process  'wwwoffled   '   running         monitored       at Sun Apr  3 
11:18:46 2005
File     'wwwoffled_rc'   accessible      monitored       at Sun Apr  3 
11:18:46 2005
Process  'privoxy     '   running         monitored       at Sun Apr  3 
11:18:46 2005
File     'privoxy_bin '   accessible      monitored       at Sun Apr  3 
11:18:47 2005
File     'privoxy_rc  '   accessible      monitored       at Sun Apr  3 
11:18:47 2005
Process  'cron        '   running         monitored       at Sun Apr  3 
11:18:46 2005
File     'cron_rc     '   accessible      monitored       at Sun Apr  3 
11:18:46 2005
Process  'syslogd     '   running         monitored       at Sun Apr  3 
11:18:46 2005
File     'syslogd_file'   accessible      monitored       at Sun Apr  3 
11:18:46 2005
File     'TestLockFile'   accessible      monitored       at Sun Apr  3 11:10:35 2005 
<<<<<<<<<<< This is wrong!
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5#         ls 
-lrta /var/cache/SystemTestDone
ls: /var/cache/SystemTestDone: No such file or directory
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5#   ./monit -c 
/etc/monit/monitrc reload

I added 'every 2 cycle' to the TestLockFile file check block:

check file TestLockFile with path /var/cache/SystemTestDone
  every 2 cycles
  if failed permission 644 then alert

But the status is still wrong too after an reload and I reseve no mail about 
TestLockFile?

Process 'ntpd'
 status                            Execution failed
 monitoring status                 monitored
 data collected                    Sun Apr  3 11:37:33 2005

File 'TestLockFile'
 status                            accessible
 monitoring status                 monitored
 path                              /var/cache/SystemTestDone
 permission                        0                                         # 
it seems true because the file does not exist!
 uid                               0
 gid                               0
 timestamp                         Thu Jan  1 01:00:00 1970     # it seems true 
because the file does not exist!
 size                              0 B
 data collected                    Sun Apr  3 11:37:33 2005

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# date
Sun Apr  3 11:38:16 CEST 2005

Than I tryed this:

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc -t
Control file syntax OK
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ls 
/var/cache/SystemTestDone
ls: /var/cache/SystemTestDone: No such file or directory
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc status | tail -20
 size                              102056 B
 data collected                    Sun Apr  3 12:04:44 2005

Process 'ntpd'
 status                            Execution failed
 monitoring status                 monitored
 data collected                    Sun Apr  3 12:04:44 2005

File 'TestLockFile'
 status                            accessible
 monitoring status                 monitored
 path                              /var/cache/SystemTestDone
 permission                        0
 uid                               0
 gid                               0
 timestamp                         Thu Jan  1 01:00:00 1970
 size                              0 B
 checksum                          (null)(MD5)
 data collected                    Sun Apr  3 12:04:44 2005

address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# !tail
tail -7 /etc/monit/monitrc

check file TestLockFile with path /var/cache/SystemTestDone
  every 2 cycles
  if failed permission 644 then alert
  if failed checksum and
     expect the sum d41d8cd98f00b204e9800998ecf8427e then alert
  if size > 0 B then alert
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# !ls
ls /var/cache/SystemTestDone
ls: /var/cache/SystemTestDone: No such file or directory
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5#

But when I configure it like this, I can't start monit?


address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# ./monit -c 
/etc/monit/monitrc -t
/etc/monit/monitrc:541: Error: cannot compute a checksum for a file 
/var/cache/SystemTestDone 'root'
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5# tail 
/etc/monit/monitrc

check file TestLockFile with path /var/cache/SystemTestDone
  every 2 cycles
  if failed permission 644 then alert
  if failed checksum and
     expect the sum d41d8cd98f00b204e9800998ecf8427e then alert
  if size != 0 B then alert
  if failed uid root then alert
   # only this seems to work for an empty file?
  if failed checksum then alert
address@hidden:/home/claus/src/buildroot/build_powerpc/monit-4.5#

What goes wrong?

Claus Klein

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]