[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Monit shows "statistic error"
From: |
Lutz Mader |
Subject: |
Re: Monit shows "statistic error" |
Date: |
Sat, 21 Nov 2020 09:40:48 +0100 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
Hello Ani,
I checked some of my logs and find a similar problem all the time the
workload is very high (on a AIX system).
[MESZ May 8 05:29:14] error : 'D100SPUABC00' mem usage of 95.5%
matches resource limit [mem usage > 95.0%]
[MESZ May 8 05:31:14] error : 'Manager' failed to get process data
>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>>
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat
As long as this is a workload problem you can configure Monit to delay a
restart. With a additinal "not exist" rule
if not exist for 5 cycles then start
in the "check process" service, Monit will start/restart the service
after 5 checks only. If Monit can not get the process data only once,
nothing will happen (I append a sample).
A suggestion only,
Lutz
Appendage:
A sample of one of the used service definitions:
check process Serv_server1 with pidfile
"/usr/local/var/wlp/servers/.pid/server1.pid"
start program "/usr/local/etc/monit/scripts/wlpserv.sh start" with
timeout 180 seconds
stop program "/usr/local/etc/monit/scripts/wlpserv.sh stop" with
timeout 120 seconds
restart program "/usr/local/etc/monit/scripts/wlpserv.sh restart" with
timeout 300 seconds
# if failed host hostname.local port 8901 then alert
# if failed host hostname.local port 9901 then alert
if not exist for 5 cycles then start
if 5 restarts within 50 cycles then unmonitor
The "not exist" rule delays the start to five checks and the "restart"
rule prevent endless recovery.