monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uptime weirdness


From: Gareth Pye
Subject: Re: uptime weirdness
Date: Thu, 19 Aug 2010 10:56:20 +1000
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

Sorry for not replying earlier, your response had cleared up things for me.

Until today when it struck me how much of a huge bug this is. If a system is power cycled (no normal shutdown procedure) so that the old pid files still exist and some other random process is running with that pid then the task that monit is meant to be monitoring will never be started.

The case I had just a few minutes ago was that the pid file ended up pointing to monit it self.

Obviously the simple hack is to remove all pid files before starting monit (or at least at some point in the boot procedure but before monit has started the processes seams most efficient). Wouldn't it make sense for monit to ensure that the pid files aren't older than the current system uptime? Obviously a process can't have been running longer than the host system.

Gareth Pye
Engineer
GPSat Systems Australia
address@hidden
Ph: 03 9455 0041
Fax: 03 9455 0042


On 12/08/10 20:56, Martin Pala wrote:
I'm not sure what system uptime in your case is - the attached monit status 
output contains following uptimes only:

1.) monit uptime: 49m  =>  monit was started 49 minutes ago (system itself may 
be running much longer - this uptime is updated whenever monit itself is 
(re)started)
2.) process 'BoomDataToMODBUS' uptime: 45m
3.) process 'DataRouter' uptime: 21h9m

=>  if the system was started less then 21h9m ago at the point when monit 
status was taken, then the reported uptime of DataRouter process is wrong. With 
monit-5.0.3 it could happen because it's based on the pidfile's timestamp. The 
next monit release (5.2) fixes this problem. Monit-5.2 changelog excerpt:

--8<--
* Show real process uptime - formerly the presented uptime was based on 
create/modify
   timestamp of process' pidfile which provides invalid uptime if the pidfile is
   replaced and process keeps running with original PID (such as on apache 
reload).
   Thanks to Nima Chavooshi for report.
--8<--

Regards,
Martin



On Aug 12, 2010, at 2:01 AM, Gareth Pye wrote:

I've just noticed that the uptime for one of my processes as reported by monit 
is greater than the system time. Is this plausible?

The Monit daemon 5.0.3 uptime: 49m

Process 'BoomDataToMODBUS'
  status                            running
  monitoring status                 monitored
  pid                               909
  parent pid                        1
  uptime                            45m
  children                          0
  memory kilobytes                  1880
  memory kilobytes total            1880
  memory percent                    1.4%
  memory percent total              1.4%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Wed Aug 11 16:40:27 2010

Process 'DataRouter'
  status                            running
  monitoring status                 monitored
  pid                               901
  parent pid                        886
  uptime                            21h 9m
  monitoring status                 monitored
  pid                               901
  parent pid                        886
  uptime                            21h 9m
  children                          0
  memory kilobytes                  3232
  memory kilobytes total            3232
  memory percent                    2.5%
  memory percent total              2.5%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Wed Aug 11 16:40:27 2010

File 'user.config'
  status                            accessible
  monitoring status                 monitored
  permission                        644
  uid                               0
  gid                               0
  timestamp                         Wed Aug 11 15:50:56 2010
  size                              1295 B
  checksum                          deeaffe3f625e93f00aeead0a0a3abd5(MD5)
  data collected                    Wed Aug 11 16:40:27 2010

Filesystem 'root'
  status                            accessible
  monitoring status                 monitored
  permission                        755
  uid                               0
  gid                               0
  filesystem flags                  0
  block size                        4096 B
  blocks total                      120169 [469.4 MB]
  blocks free for non superuser     14286 [55.8 MB] [11.9%]
  blocks free total                 14286 [55.8 MB] [11.9%]
  inodes total                      134976
  inodes free                       116759 [86.5%]
  data collected                    Wed Aug 11 16:40:27 2010

System 'Test-Base'
  status                            running
  monitoring status                 monitored
  load average                      [0.00] [0.00] [0.00]
  cpu                               0.0%us 0.1%sy 0.0%wa
  memory usage                      12892 kB [10.1%]
  data collected                    Wed Aug 11 16:40:27 2010

--
Gareth Pye
Engineer
GPSat Systems Australia
address@hidden
Ph: 03 9455 0041
Fax: 03 9455 0042


--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]