monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cannot get Monit to run more than 60 seconds


From: Sasha Yohananov
Subject: Re: Cannot get Monit to run more than 60 seconds
Date: Thu, 16 Dec 2010 15:02:26 +0200

Hi, 
I had a similar problem (may be not exactly the same) on Atmel AT91SAM9260-EK with 32 bit ARM running embedded Linux.
In my case it was a floating point exception killing monit silently.

As far as I discovered my monit ran for about 1 minute until the first CPU statistic calculation (process/sysdep_LINUX.c)
After several attempts I got it running properly.
I just changed types of all unsigned long long appear in this file to unsigned long and all '%llu' to '%lu' (the latter - to make gcc happy).

Try to take a look at your syslog files, in my case it looks like follows (/var/log/messages)

Dec 16 12:47:49 192 user.debug kernel: NWFPE: monit[943] takes exception 00000001 at c080b8d0 from 401eb3d8

First time I fixed (hacked) it for about 2 year ago, from then I have upgraded several times to the latest monit's version, each time repeating the fix. Currently my board is equipped with:
monit -V
This is Monit version 5.2.2
Copyright (C) 2000-2010 by Tildeslash Ltd. All Rights Reserved.

Everything works just perfect (many thanks to monit's team). 



On Thu, Dec 16, 2010 at 9:57 AM, EzCom Keith <address@hidden> wrote:

Hi everyone..

I've about reached the end of my road here trying to get Monit to run, and at this point,
I'm simply going 'uncle' and posting for help. I have Googled, I have read documentation,
I have studied examples, all to no avail so far. The app runs for the specified 60 second
'wait' period in my monitrc, then goes away. No matter what I've tried, it's the exact
same result.

Let me begin by saying I followed this guide here:
http://www.howtoforge.com/server-monitoring-with-munin-and-monit-on-centos-5.2-p2

I went through the setup for a 64 bit box with CentOS 5 Final. Every step matched what was
documented to the 'T'. After doing the SSL certs, the website said "finally, we can start
Monit: /etc/init.d/monit start", which I did. It complained my mysqld wasn't in the right
path, nor my postfix. I just commented those entries out to come back to them later, and
restarted the daemon. It seemed to grab, as a ps aux | grep monit showed it running, and
/etc/init.d/monit status confirmed it. I opened a browser and pointed it to my box with
the proper port, but got nothing. Went back to the running processes and found Monit dead.

Going through the monit.log, I saw there was an id error, because the folder expected to
hold the id wasn't there. I created it, re-ran the daemon, and this time it reported that
it wrote a unique id file to the directory I created, and it was once again running. 60
seconds later, it was dead again. The monit.log revealed nothing out of the ordinary, here
is what a cycle of start -> dead looks like in the log:

[EST Dec 15 14:11:26] info     : monit: generated unique Monit id 99655fc9cc168e531b8d9734cab746b9 and stored to '/var/monit/id'
[EST Dec 15 14:11:26] info     : Starting monit daemon with http interface at [*:2812]
[EST Dec 15 14:11:26] info     : Monit start delay set -- pause for 60s
[EST Dec 15 14:12:26] info     : Starting monit HTTP server at [*:2812]

I then started running the daemon in the foreground with noise, and frankly, if the problem
is revealed in there, I don't see it. Here's that:

$/usr/bin/monit -d 10 -c /etc/monit.d/monitrc -v -l /var/log/monit.log

monit: Debug: Adding net allow '{my_home_ip_here}'.
monit: Debug: Adding credentials for user 'admin'.
Runtime constants:
 Control file       = /etc/monit.d/monitrc
 Log file           = /var/log/monit.log
 Pid file           = /var/run/monit.pid
 Debug              = True
 Log                = True
 Use syslog         = False
 Is Daemon          = True
 Use process engine = True
 Poll time          = 10 seconds with start delay 0 seconds
 Expect buffer      = 256 bytes
 Mail from          = (not defined)
 Mail subject       = (not defined)
 Mail message       = (not defined)
 Start monit httpd  = True
 httpd bind address = Any/All
 httpd portnumber   = 2812
 httpd signature    = True
 Use ssl encryption = True
 PEM key/cert file  = /var/certs/monit.pem
 Client cert file   = None
 Allow self certs   = False
 httpd auth. style  = Basic Authentication and Host/Net allow list

The service list contains the following entries:

Process Name          = proftpd
 Pid file             = /var/run/proftpd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/proftpd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/proftpd stop' timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Port                 = if failed localhost:21 [FTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 5 times within 5 cycle(s) then unmonitor

Process Name          = sshd
 Pid file             = /var/run/sshd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/sshd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/sshd stop' timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Port                 = if failed localhost:22 [SSH via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 5 times within 5 cycle(s) then unmonitor

Process Name          = apache
 Group                = www
 Pid file             = /var/run/httpd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/httpd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/httpd stop' timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Port                 = if failed www.ezcommunities.com:80/monit/token [HTTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Load avg. (5min)     = if greater than 10.0 8 times within 8 cycle(s) then stop else if succeeded 1 times within 1 cycle(s) then alert
 Children             = if greater than 250 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 CPU usage limit      = if greater than 80.0% 5 times within 5 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 CPU usage limit      = if greater than 60.0% 2 times within 2 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

System Name           = system_{myexample.site.com}
 Monitoring mode      = active

-------------------------------------------------------------------------------
Starting monit daemon with http interface at [*:2812]

monit.log says:

[EST Dec 16 02:27:04] info     : Starting monit daemon with http interface at [*:2812]
[EST Dec 16 02:27:04] info     : Starting monit HTTP server at [*:2812]
[EST Dec 16 02:27:04] info     : monit HTTP server started
[EST Dec 16 02:27:04] info     : 'system_{myexample.site.com}' Monit started

/etc/init.d/monit status says:
monit dead but pid file exists

For completeness, here is monitrc:

set daemon  60 with start delay 60
set logfile /var/log/monit.log
# set mailserver localhost
# set mail-format { from: address@hidden} }
# set alert address@hidden
set httpd port 2812 and
     SSL ENABLE
     PEMFILE  /var/certs/monit.pem
     allow {my_home_ip_here}
     allow admin:test

check process proftpd with pidfile /var/run/proftpd.pid
   start program = "/etc/init.d/proftpd start"
   stop program  = "/etc/init.d/proftpd stop"
   if failed port 21 protocol ftp then restart
   if 5 restarts within 5 cycles then timeout

check process sshd with pidfile /var/run/sshd.pid
   start program  "/etc/init.d/sshd start"
   stop program  "/etc/init.d/sshd stop"
   if failed port 22 protocol ssh then restart
   if 5 restarts within 5 cycles then timeout

# check process mysql with pidfile /var/run/mysqld/mysqld.pid
   # group database
   # start program = "/usr/sbin/mysqld start"
   # stop program = "/usr/sbin/mysqld stop"
   # if failed host 127.0.0.1 port 3306 then restart
   # if 5 restarts within 5 cycles then timeout

check process apache with pidfile /var/run/httpd.pid
   group www
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   if failed host {myexample.site.com} port 80 protocol http
      and request "/monit/token" then restart
   if cpu is greater than 60% for 2 cycles then alert
   if cpu > 80% for 5 cycles then restart
   # if totalmem > 500 MB for 5 cycles then restart
   if children > 250 then restart
   if loadavg(5min) greater than 10 for 8 cycles then stop
   if 3 restarts within 5 cycles then timeout

# check process postfix with pidfile /var/spool/postfix/pid/master.pid
   # group mail
   # start program = "/etc/init.d/postfix start"
   # stop  program = "/etc/init.d/postfix stop"
   # if failed port 25 protocol smtp then restart
   # if 5 restarts within 5 cycles then timeout

As stated, I'm at a dead-end. I have no idea what to try next, as I've tried everything that
I could see from a variety of other trouble posts, but always end up with a dead service
after 60 seconds.

Help appreciated. = )

- Keith


--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]