monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] can't monitor one of my filesystems


From: zachlac
Subject: Re: [monit] can't monitor one of my filesystems
Date: Wed, 12 May 2010 06:42:59 -0700 (PDT)

You are right, it is not mounted.  This is the difficulty in building a
monitoring system on machines which I'm not familiar with :) 

Thank you.



Martin Pala wrote:
> 
> Thanks for output.
> 
> It seems that the reason could be, that the device is not mounted - it was
> not found in /etc/mtab. The statvfs() interface which is used to get
> filesystem usage needs path to object which is  on the filesystem to be
> tested - hence when device name is used, monit translates it to mountpoint
> using /etc/mtab. There is currently no fine-grained error message for this
> state and it is catched by the test itself which logs general "unable to
> read filesystem /dev/sda2 state".
> 
> Please can you check that /dev/sda2 is mounted and that it can be found in
> /etc/mtab?
> 
> 
> On May 6, 2010, at 9:33 PM, zachlac wrote:
> 
>> 
>> Here's the output of monit -vl.  I do not believe that it's a virtual
>> machine.
>> 
>> [EDT May  6 15:28:57] debug    : monit: pidfile '/var/run/monit.pid' does
>> not exist
>> [EDT May  6 15:28:57] info     : Starting monit daemon with http
>> interface
>> at [www.***************.com:2812]
>> [EDT May  6 15:28:57] info     : Starting monit HTTP server at
>> [www.***************.com:2812]
>> [EDT May  6 15:28:57] info     : monit HTTP server started
>> [EDT May  6 15:28:57] info     : 'www' Monit started
>> [EDT May  6 15:28:57] debug    : Monit instance changed notification is
>> sent
>> to address@hidden
>> [EDT May  6 15:28:57] debug    : cannot open file /proc/32077/stat -- No
>> such file or directory
>> [EDT May  6 15:28:57] debug    : system statistic error -- cannot read
>> /proc/32077/stat
>> [EDT May  6 15:28:57] debug    : 'www' cpu wait usage check succeeded
>> [current cpu wait usage=-1.0%]
>> [EDT May  6 15:28:57] debug    : 'www' cpu system usage check succeeded
>> [current cpu system usage=-1.0%]
>> [EDT May  6 15:28:57] debug    : 'www' cpu user usage check succeeded
>> [current cpu user usage=-1.0%]
>> [EDT May  6 15:28:57] debug    : 'www' swap usage check succeeded
>> [current
>> swap usage=0.0%]
>> [EDT May  6 15:28:57] debug    : 'www' mem usage check succeeded [current
>> mem usage=34.8%]
>> [EDT May  6 15:28:57] debug    : 'www' loadavg(5min) check succeeded
>> [current loadavg(5min)=0.1]
>> [EDT May  6 15:28:57] debug    : 'www' loadavg(1min) check succeeded
>> [current loadavg(1min)=0.0]
>> [EDT May  6 15:28:57] debug    : 'apache_bin' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'apache_bin' is a regular file
>> [EDT May  6 15:28:57] debug    : 'apache_bin' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'apache_bin' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'apache_bin' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'apache_bin' gid check succeeded
>> [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'apache_rc' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'apache_rc' is a regular file
>> [EDT May  6 15:28:57] debug    : 'apache_rc' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'apache_rc' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'apache_rc' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'apache_rc' gid check succeeded [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' is a regular file
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' permission check
>> succeeded
>> [current permission=6755]
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'sendmail_bin' gid check succeeded
>> [current
>> gid=51]
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' is a regular file
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'sendmail_rc' gid check succeeded
>> [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' is a regular file
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'dovecot_bin' gid check succeeded
>> [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' is a regular file
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'dovecot_rc' gid check succeeded
>> [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' is a regular file
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'ntpd_bin' gid check succeeded [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' file existence check succeeded
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' is a regular file
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'ntpd_rc' gid check succeeded [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' file existence check
>> succeeded
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' is a regular file
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'sshd_bin' gid check succeeded [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' file existence check succeeded
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' is a regular file
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' has valid checksums
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' permission check succeeded
>> [current permission=0755]
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'sshd_rc' gid check succeeded [current
>> gid=0]
>> [EDT May  6 15:28:57] debug    : 'apache' zombie check succeeded
>> [status_flag=0000]
>> [EDT May  6 15:28:57] debug    : 'apache' loadavg(5min) check succeeded
>> [current loadavg(5min)=0.1]
>> [EDT May  6 15:28:57] debug    : 'apache' children check succeeded
>> [current
>> children=13]
>> [EDT May  6 15:28:57] debug    : 'apache' total mem amount check
>> succeeded
>> [current total mem amount=263024kB]
>> [EDT May  6 15:28:57] debug    : 'apache' cpu usage check skipped
>> (initializing)
>> [EDT May  6 15:28:57] debug    : [EDT May  6 15:28:57] debug    :
>> 'apache'
>> succeeded connecting to INET[www.***************.com:80] via TCP
>> [EDT May  6 15:28:57] debug    : 'apache' succeeded testing protocol
>> [HTTP]
>> at INET[www.***************.com:80] via TCP
>> [EDT May  6 15:28:57] debug    : 'sendmail' zombie check succeeded
>> [status_flag=0000]
>> [EDT May  6 15:28:57] debug    : 'sendmail' succeeded connecting to
>> INET[localhost:25] via TCP
>> [EDT May  6 15:28:57] debug    : 'sendmail' succeeded testing protocol
>> [SMTP] at INET[localhost:25] via TCP
>> [EDT May  6 15:28:57] debug    : 'dovecot' zombie check succeeded
>> [status_flag=0000]
>> [EDT May  6 15:28:57] debug    : 'dovecot' succeeded connecting to
>> INET[localhost:993] via TCPSSL
>> [EDT May  6 15:28:57] debug    : 'dovecot' succeeded testing protocol
>> [IMAP]
>> at INET[localhost:993] via TCPSSL
>> [EDT May  6 15:28:57] debug    : 'ntp' zombie check succeeded
>> [status_flag=0000]
>> [EDT May  6 15:28:57] debug    : 'ssh' zombie check succeeded
>> [status_flag=0000]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' permission check succeeded
>> [current permission=0640]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' uid check succeeded
>> [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' gid check succeeded
>> [current
>> gid=6]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' inode usage check
>> succeeded
>> [current inode usage=1.5%]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' inode usage check
>> succeeded
>> [current inode usage=1.5%]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' space usage check
>> succeeded
>> [current space usage=69.6%]
>> [EDT May  6 15:28:57] debug    : 'datafs_sdb1' space usage check
>> succeeded
>> [current space usage=69.6%]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' permission check succeeded
>> [current permission=0640]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' gid check succeeded [current
>> gid=6]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' inode usage check succeeded
>> [current inode usage=0.1%]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' inode usage check succeeded
>> [current inode usage=0.1%]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' space usage check succeeded
>> [current space usage=14.6%]
>> [EDT May  6 15:28:57] debug    : 'swap_sdb2' space usage check succeeded
>> [current space usage=14.6%]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' permission check succeeded
>> [current permission=0640]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' uid check succeeded [current
>> uid=0]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' gid check succeeded [current
>> gid=6]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' inode usage check succeeded
>> [current inode usage=0.1%]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' inode usage check succeeded
>> [current inode usage=0.1%]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' space usage check succeeded
>> [current space usage=24.2%]
>> [EDT May  6 15:28:57] debug    : 'boot_sda1' space usage check succeeded
>> [current space usage=24.2%]
>> [EDT May  6 15:28:57] error    : 'datafs_sda2' unable to read filesystem
>> /dev/sda2 state
>> [EDT May  6 15:28:57] debug    : Data access error notification is sent
>> to
>> address@hidden
>> [EDT May  6 15:28:58] debug    : 'rootfs_logical' space usage check
>> succeeded [current space usage=35.6%]
>> [EDT May  6 15:28:58] debug    : ICMP echo response 1/3 succeeded --
>> received id=38340 sequence=0 response_time=0.000171s
>> [EDT May  6 15:28:58] debug    : 'shade' icmp ping succeeded [response
>> time
>> 0.000s]
>> [EDT May  6 15:28:58] debug    : 'shade' succeeded connecting to
>> INET[xxx.xxx.xxx.xxx:22] via TCP
>> [EDT May  6 15:28:58] debug    : 'shade' succeeded testing protocol [SSH]
>> at
>> INET[xxx.xxx.xxx.xxx:22] via TCP
>> [EDT May  6 15:29:07] debug    : HttpRequest error: HTTP/1.0 401 You are
>> not
>> authorized to access monit. Either you supplied the wrong credentials
>> (e.g.
>> bad password), or your browser doesn't understand how to supply the
>> credentials required
>> [EDT May  6 15:29:09] debug    : HttpRequest error: HTTP/1.0 404 There is
>> no
>> service by that name
>> [EDT May  6 15:29:13] debug    : HttpRequest error: HTTP/1.0 404 There is
>> no
>> service by that name
>> 
>> 
>> 
>> Martin Pala wrote:
>>> 
>>> Is the system virtual machine of some type (VPS, etc.?) or real/physical
>>> machine? If it is virtual it is possible that the access is rejected
>>> based
>>> on host OS restrictions. There can be also other access control
>>> restrictions - for example if you use SElinux ... 
>>> 
>>> The svn repository contains development version of 5.2 in various
>>> development stages (some features may be incomplete) and also the
>>> features
>>> may not been tested yet - the exact codebase depends on when you updated
>>> the source code. The problems which you have shouldn't be specific to
>>> 5.2-development anyway as there were no changes which could exacerbate
>>> like this, but it could be good to verify the behavior with official
>>> 5.1.1
>>> version.
>>> 
>>> Please can you also run monit with debug enabled and provide full
>>> output?:
>>> 
>>> monit -vI
>>> 
>>> 
>>> 
>>> 
>>> On May 4, 2010, at 4:05 PM, zachlac wrote:
>>> 
>>>> 
>>>> sda2 cannot be monitored, while sda1 can:
>>>> 
>>>> # ls -l /dev/sda2
>>>> brw-r----- 1 root disk 8, 2 Feb 19 14:22 /dev/sda2
>>>> # ls -l /dev/sda1
>>>> brw-r----- 1 root disk 8, 2 Feb 19 14:22 /dev/sda1
>>>> 
>>>> I'm using the repository version of monit, which is 5.2.
>>>> 
>>>> Thank you.
>>>> 
>>>> 
>>>> Martin Pala wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> LVM shouldn't be problem, please can you provide output of "ls -l
>>>>> /dev/sda2"? Which monit version do you use? There was problem in monit
>>>>> <=
>>>>> 4.10.1 when the device was symlink - the support for device symlinks
>>>>> was
>>>>> added in Monit 5.0 (current version is Monit 5.1.1).
>>>>> 
>>>>> Optionally you can use mount point instead of device.
>>>>> 
>>>>> Regards,
>>>>> Martin
>>>>> 
>>>>> 
>>>>> On May 3, 2010, at 6:43 PM, zachlac wrote:
>>>>> 
>>>>>> 
>>>>>> I have monit monitoring /dev/sdb1, /dev/sdb2, and /dev/sda1. 
>>>>>> However,
>>>>>> /dev/sda2 is a Linux LVM, and when I try to monitor it I get a "Data
>>>>>> access
>>>>>> error".  My output for fdisk is as follows:
>>>>>> ----------------------------------------------------------------------------------------------
>>>>>> isk /dev/sda: 200.0 GB, 200049647616 bytes
>>>>>> 255 heads, 63 sectors/track, 24321 cylinders
>>>>>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>>>>> 
>>>>>> Device Boot      Start         End      Blocks   Id  System
>>>>>> /dev/sda1   *           1          13      104391   83  Linux
>>>>>> /dev/sda2              14       24321   195254010   8e  Linux LVM
>>>>>> 
>>>>>> Disk /dev/sdb: 200.0 GB, 200049647616 bytes
>>>>>> 255 heads, 63 sectors/track, 24321 cylinders
>>>>>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>>>>> 
>>>>>> Device Boot      Start         End      Blocks   Id  System
>>>>>> /dev/sdb1   *           1       12160    97675168+  83  Linux
>>>>>> /dev/sdb2           12161       24321    97683232+  83  Linux
>>>>>> --------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> My monitrc contains the following important lines:
>>>>>> ---------------------------------------------------------------------------------------------
>>>>>> check filesystem boot_sda1 with path /dev/sda1
>>>>>>  start program  = "/bin/mount /data"
>>>>>>  stop program  = "/bin/umount /data"
>>>>>>  if failed permission 640 then unmonitor
>>>>>>  if failed uid root then unmonitor
>>>>>>  if failed gid disk then unmonitor
>>>>>>  if space usage > 80% for 5 times within 15 cycles then alert
>>>>>>  if space usage > 99% then stop
>>>>>> #    if inode usage > 30000 then alert
>>>>>> #    if inode usage > 250000 then alert
>>>>>>  if inode usage > 80% then alert
>>>>>>  if inode usage > 99% then stop
>>>>>>  group server
>>>>>> 
>>>>>> check filesystem datafs_sda2 with path /dev/sda2
>>>>>>  start program  = "/bin/mount /data"
>>>>>>  stop program  = "/bin/umount /data"
>>>>>>  if failed permission 640 then unmonitor
>>>>>>  if failed uid root then unmonitor
>>>>>>  if failed gid disk then unmonitor
>>>>>>  if space usage > 80% for 5 times within 15 cycles then alert
>>>>>>  if space usage > 99% then stop
>>>>>> #    if inode usage > 30000 then alert
>>>>>> #    if inode usage > 250000 then alert
>>>>>>  if inode usage > 80% then alert
>>>>>>  if inode usage > 99% then stop
>>>>>>  group server
>>>>>> ---------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> Why can't I monitor the LVM?
>>>>>> 
>>>>>> Thank you.
>>>>>> -- 
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/-monit--can%27t-monitor-one-of-my-filesystems-tp28437378p28437378.html
>>>>>> Sent from the monit-general mailing list archive at Nabble.com.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> To unsubscribe:
>>>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> To unsubscribe:
>>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> View this message in context:
>>>> http://old.nabble.com/-monit--can%27t-monitor-one-of-my-filesystems-tp28437378p28447734.html
>>>> Sent from the monit-general mailing list archive at Nabble.com.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe:
>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/-monit--can%27t-monitor-one-of-my-filesystems-tp28437378p28478533.html
>> Sent from the monit-general mailing list archive at Nabble.com.
>> 
>> 
>> 
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 

-- 
View this message in context: 
http://old.nabble.com/-monit--can%27t-monitor-one-of-my-filesystems-tp28437378p28536140.html
Sent from the monit-general mailing list archive at Nabble.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]