monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Monit starts a service and checks, then restarts it too fast


From: Martin Pala
Subject: Re: [monit] Monit starts a service and checks, then restarts it too fast (Possible bug?)
Date: Mon, 13 Apr 2009 13:42:02 +0200

the configuration from monit-4.x should be fully compatible

Martin


On Apr 13, 2009, at 1:12 PM, Vianney Lejeune wrote:

Thank you for your quick reply. I know that my monit's config contains a lot of monit-4.x-workarounds (such as a separate .process_restarted in order to run a command line), what can I delete/change in my conf to fully comply with the monit 5.0's fonctions ?

Regards,
Mr Lejeune

Le 13 avr. 09 à 11:38, Martin Pala a écrit :

Hello,

this problem was fixed in monit-5.0 ... you can get beta here:
http://www.mmonit.com/monit/dist/beta/

The final monit-5.0 release will be ready during next week.


Martin


On Apr 13, 2009, at 10:36 AM, Vianney Lejeune wrote:

Hello !

I have a big problem after upgrading from monit 4.8.1 to monit 4.10.1 on Debian Etch. I use monit to start shared services from heartbeat. The first shared service to start is mysqld and for an unknown reason, monit starts up the mysqld process, then check it one or 2 seconds later (which is *really* too fast), so monit starts up the mysqld service again. At the end, I have 2 mysqld processes (one becomes a zombie process, preventing me from starting the mysqld service).

If I start up mysql manually, I don't have any problem. What can I do ?

Regards,
Mr Lejeune


This is a syslog's summary  (mysqld only):

Apr 13 10:16:18 Inet-Primaire monit[3870]: 'mysqld' start: /bin/bash
Apr 13 10:16:20 Inet-Primaire monit[3870]: 'mysqld' process is not running
Apr 13 10:16:20 Inet-Primaire mysqld_safe[4997]: started
Apr 13 10:16:21 Inet-Primaire monit[3870]: 'mysqld' trying to restart
Apr 13 10:16:21 Inet-Primaire monit[3870]: 'mysqld' start: /bin/bash
Apr 13 10:16:21 Inet-Primaire mysqld_safe[5086]: started
Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Do you already have another mysqld server running on port: 3306 ? Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Aborting

Full syslog summary (monit only)

Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'vsftpd' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.vsftpd_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened >Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'mysqld' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened >Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.mysqld_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'apache2' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.apache2_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'dhcpd' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.dhcpd_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'eserver' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.eserver_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'squid_cache' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.squid_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service 'freeradius' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: start service '.freeradius_restarted' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened Apr 13 10:16:17 Inet-Primaire monit[3870]: restart service 'bind9' on user request Apr 13 10:16:17 Inet-Primaire monit[3870]: monit daemon at 3870 awakened
Apr 13 10:16:18 Inet-Primaire monit[3870]: 'vsftpd' start: /bin/bash
>Apr 13 10:16:18 Inet-Primaire monit[3870]: 'mysqld' start: /bin/ bash Apr 13 10:16:18 Inet-Primaire monit[3870]: 'apache2' start: /bin/ bash Apr 13 10:16:18 Inet-Primaire monit[3870]: 'freeradius' start: / bin/bash
Apr 13 10:16:18 Inet-Primaire monit[3870]: 'dhcpd' start: /bin/bash
Apr 13 10:16:18 Inet-Primaire monit[3870]: 'eserver' start: /bin/ bash Apr 13 10:16:18 Inet-Primaire monit[3870]: 'squid_cache' start: / bin/bash Apr 13 10:16:18 Inet-Primaire monit[3870]: Awakened by User defined signal 1 >Apr 13 10:16:20 Inet-Primaire monit[3870]: 'mysqld' process is not running
>Apr 13 10:16:20 Inet-Primaire mysqld_safe[4997]: started
>Apr 13 10:16:21 Inet-Primaire monit[3870]: 'mysqld' trying to restart >Apr 13 10:16:21 Inet-Primaire monit[3870]: 'mysqld' start: /bin/ bash Apr 13 10:16:21 Inet-Primaire monit[3870]: 'apache2' start: /bin/ bash Apr 13 10:16:21 Inet-Primaire monit[3870]: 'freeradius' start: / bin/bash
>Apr 13 10:16:21 Inet-Primaire mysqld_safe[5086]: started
>Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use >Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Do you already have another mysqld server running on port: 3306 ? >Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [ERROR] Aborting
>Apr 13 10:16:22 Inet-Primaire mysqld[5089]:
>Apr 13 10:16:22 Inet-Primaire mysqld[5089]: 090413 10:16:22 [Note] /usr/sbin/mysqld: Arr?t du serveur termin?
>Apr 13 10:16:22 Inet-Primaire mysqld[5089]:
>Apr 13 10:16:22 Inet-Primaire mysqld_safe[5110]: ended
>Apr 13 10:16:22 Inet-Primaire mysqld[5000]: 090413 10:16:22 [Note] /usr/sbin/mysqld: ready for connections. >Apr 13 10:16:22 Inet-Primaire mysqld[5000]: Version: '5.0.32- Debian_7etch8-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Debian etch distribution >Apr 13 10:16:22 Inet-Primaire /etc/mysql/debian-start[5130]: Upgrading MySQL tables if necessary. >Apr 13 10:16:23 Inet-Primaire /etc/mysql/debian-start[5158]: Upgrading MySQL tables if necessary.

Monit  configuration:

monit: Debug: Adding host allow 'localhost'
monit: Debug: Skipping redundant host 'localhost'
monit: Debug: Skipping redundant host 'localhost'
Runtime constants:
Control file       = /etc/monit//monitrc
Log file           = syslog
Pid file           = /var/run/monit.pid
Debug              = True
Log                = True
Use syslog         = True
Is Daemon          = True
Use process engine = True
Poll time          = 15 seconds
Event queue        = base directory /var/monit with 200 slots
Mail server(s)     = localhost:25
Mail from          = address@hidden
Mail subject       = monit alert --  $EVENT $SERVICE
Mail message       = $EVENT Service $SERV..(truncated)
Start monit httpd  = True
httpd bind address = localhost
httpd portnumber   = 3001
httpd signature    = True
Use ssl encryption = False
httpd auth. style  = Host/Net allow list
Alert mail to      = address@hidden
Alert on         = All events

The service list contains the following entries:

Process Name          = vsftpd
Group                = InetShared
Pid file             = /var/run/vsftpd/vsftpd.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/vsftpd start; touch /etc/monit/.vsftpd_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/vsftpd stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:21 [FTP via TCP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/usr/ lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:21 [FTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .vsftpd_restarted
Group                = InetShared
Path                 = /etc/monit/.vsftpd_restarted
Monitoring mode      = manual
Depends on Service   = vsftpd
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = sshd
Group                = local
Pid file             = /var/run/sshd.pid
Monitoring mode      = active
Start program        = '/etc/init.d/ssh start' timeout 1 cycle(s)
Stop program         = '/etc/init.d/ssh stop' timeout 1 cycle(s)
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed localhost:2145 [SSH via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert Timeout = If 2 restart within 2 cycles then unmonitor else if passed then alert

Process Name          = mysqld
Group                = InetShared
Pid file             = /var/run/mysqld/mysqld.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/mysql start; touch /etc/monit/.mysqld_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/mysql stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 127.0.0.1:3306 [DEFAULT via TCP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/usr/ lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 127.0.0.1:3306 [DEFAULT via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .mysqld_restarted
Group                = InetShared
Path                 = /etc/monit/.mysqld_restarted
Monitoring mode      = manual
Depends on Service   = mysqld
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = apache2
Group                = InetShared
Pid file             = /var/run/apache2.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/apache2 start; touch /etc/monit/.apache2_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/apache2 stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Depends on Service   = mysqld
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:80 [HTTP via TCP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/usr/ lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:80 [HTTP via TCP] with timeout 5 seconds 2 times within 2 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .apache2_restarted
Group                = InetShared
Path                 = /etc/monit/.apache2_restarted
Monitoring mode      = manual
Depends on Service   = apache2
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = postfix
Group                = local
Pid file             = /var/spool/postfix/pid/master.pid
Monitoring mode      = active
Start program = '/etc/init.d/postfix start' timeout 1 cycle(s)
Stop program         = '/etc/init.d/postfix stop' timeout 1 cycle(s)
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed localhost:25 [SMTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert Timeout = If 4 restart within 4 cycles then unmonitor else if passed then alert

Process Name          = dhcpd
Group                = InetShared
Pid file             = /var/run/dhcpd.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/dhcp3-server start; touch /etc/monit/.dhcpd_restarted' timeout 1 cycle(s) Stop program = '/etc/init.d/dhcp3-server stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:67 [DEFAULT via UDP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/usr/ lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:67 [DEFAULT via UDP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .dhcpd_restarted
Group                = InetShared
Path                 = /etc/monit/.dhcpd_restarted
Monitoring mode      = manual
Depends on Service   = dhcpd
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = bind9
Group                = local
Pid file             = /var/run/bind/run/named.pid
Monitoring mode      = active
Start program = '/bin/bash -c /etc/init.d/bind9 start; touch /etc/monit/.bind9_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/bind9 stop' timeout 1 cycle(s)
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed localhost:53 [DEFAULT via UDP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .bind9_restarted
Group                = local
Path                 = /etc/monit/.bind9_restarted
Monitoring mode      = active
Depends on Service   = bind9
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = upsd
Group                = local
Pid file             = /var/run/nut/upsd.pid
Monitoring mode      = active
Start program = '/etc/init.d/ups-monitor start' timeout 1 cycle(s) Stop program = '/etc/init.d/ups-monitor stop' timeout 1 cycle(s) Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Timeout = If 4 restart within 4 cycles then unmonitor else if passed then alert

Process Name          = upsmon
Group                = local
Pid file             = /var/run/nut/upsmon.pid
Monitoring mode      = active
Start program = '/etc/init.d/ups-monitor start' timeout 1 cycle(s) Stop program = '/etc/init.d/ups-monitor stop' timeout 1 cycle(s) Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Timeout = If 4 restart within 4 cycles then unmonitor else if passed then alert

Process Name          = upsdriver
Group                = local
Pid file             = /var/run/nut/newhidups-auto.pid
Monitoring mode      = active
Start program = '/etc/init.d/ups-monitor start' timeout 1 cycle(s) Stop program = '/etc/init.d/ups-monitor stop' timeout 1 cycle(s) Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Timeout = If 4 restart within 4 cycles then unmonitor else if passed then alert

Process Name          = eserver
Group                = InetShared
Pid file             = /var/run/eserver.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/eserver start; touch /etc/monit/.eserver_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/eserver stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:4661 [DEFAULT via TCP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/ usr/lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:4661 [DEFAULT via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .eserver_restarted
Group                = InetShared
Path                 = /etc/monit/.eserver_restarted
Monitoring mode      = manual
Depends on Service   = eserver
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = squid_cache
Group                = InetShared
Pid file             = /var/run/squid.pid
Monitoring mode      = manual
Start program = '/bin/bash -c /etc/init.d/squid start; touch /etc/monit/.squid_restarted' timeout 1 cycle(s)
Stop program         = '/etc/init.d/squid stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed localhost:3128 [DEFAULT via TCP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/usr/ lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed localhost:3128 [DEFAULT via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .squid_restarted
Group                = InetShared
Path                 = /etc/monit/.squid_restarted
Monitoring mode      = manual
Depends on Service   = squid_cache
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = freeradius
Group                = InetShared
Pid file             = /var/run/freeradius/freeradius.pid
Monitoring mode      = manual
Start program = '/bin/bash -c sleep 5; /etc/init.d/ freeradius start; touch /etc/monit/.freeradius_restarted' timeout 1 cycle(s) Stop program = '/etc/init.d/freeradius stop' timeout 1 cycle(s)
Depends on Service   = heartbeat
Depends on Service   = mysqld
Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:1812 [DEFAULT via UDP] with timeout 5 seconds 3 times within 3 cycle(s) then exec '/ usr/lib/heartbeat/hb_standby' timeout 1 cycle(s) else if passed 1 times within 1 cycle(s) then alert Port = if failed 10.0.254.254:1812 [DEFAULT via UDP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert

File Name             = .freeradius_restarted
Group                = InetShared
Path                 = /etc/monit/.freeradius_restarted
Monitoring mode      = manual
Depends on Service   = freeradius
Timestamp = if changed 3 times within 3 cycle(s) then exec '/usr/lib/heartbeat/hb_standby' timeout 1 cycle(s)

Process Name          = heartbeat
Group                = local
Pid file             = /var/run/heartbeat.pid
Monitoring mode      = active
Start program = '/etc/init.d/heartbeat start' timeout 1 cycle(s) Stop program = '/etc/init.d/heartbeat stop' timeout 1 cycle(s) Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Timeout = If 4 restart within 4 cycles then unmonitor else if passed then alert

System Name           = Inet-Primaire
Monitoring mode      = active

-------------------------------------------------------------------------------
monit daemon at 3870 awakened


--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general



--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]