monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monitoring the wrong thing?


From: Martin Pala
Subject: Re: Monitoring the wrong thing?
Date: Fri, 06 Aug 2010 11:44:00 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100720 Fedora/3.0.6-1.fc12 Thunderbird/3.0.6

The configuration is OK, but the limits should be modified. It seems that your system has CPU usage spikes which last for 2+ cycles and it trigger the alert. The limit should be set so, that you get alert only if the state is abnormal/pathological. What is normal/abnormal is specific for each system - you can watch the load and then set the limits accordingly ... for example rise the cpu(user) usage to 90% for 10 cycles.

Regards,
Martin


On 08/04/2010 09:30 PM, Marc Pinnell wrote:
Finally got Monit going this am on my webserver (daemon, 2 min interval). Since 
then I am getting a couple of warnings an hour about high CPU loads. I am 
monitoring the wrong thing (I don't totally understand the UNIX terminology 
about loads)? Here is my config:

   check system 1027mail
     if loadavg (1min)>  4 then alert
     if loadavg (5min)>  2 then alert
     if memory usage>  75% then alert
     if cpu usage (user)>  70% for 2 cycles then alert
     if cpu usage (system)>  30% for 2 cycles then alert

and a warning I just received:


Begin forwarded message:

Resource limit matched Service 1027mail

        Date:        Wed, 04 Aug 2010 15:24:13 -0400
        Action:      alert
        Host:        1027mail
        Description: cpu user usage of 80.8% matches resource limit [cpu user 
usage>70.0%]


and then two minutes later:

Resource limit succeeded Service 1027mail

        Date:        Wed, 04 Aug 2010 15:26:19 -0400
        Action:      alert
        Host:        1027mail
        Description: '1027mail' cpu user usage check succeeded [current cpu 
user usage=0.0%]


This is the way they all go so far. Seems like if it happens to check at the 
very moment a web request comes in (which obviously happens on a regular 
basis!), it trips the warning.

Suggestions?

Marc




reply via email to

[Prev in Thread] Current Thread [Next in Thread]