Re: total cpu process bug?

On Jan 6, 2012, at 9:32 PM, Tom Pepper wrote:

Hi, Martin:

Can you clarify what exactly these two lines do in process.c's cpu percentage calculation?

if (pt[i].cpu_percent > 1000 / systeminfo.cpus)
pt[i].cpu_percent = 1000 / systeminfo.cpus;

They're causing total cpu to be misreported when processes use a large amount of CPU and many cores are present. Shouldn't the "/ systeminfo.cpus" be dropped in both cases? I assume it's meant to keep any strange math from causing process cpu percentage to ever exceed 100%.

For example, with a 120s query delay, a process I have on a 24 core box calculates with process.c's logic as:

cputime = 4809915 cputime_prev = 4803601 (delta 6314)
time = 13258814089.516930 time_prev = 13258812889.395201 (delta 1200)

cputime - cputime_prev / time - time_prev = 6314/1200 = 5.26
1000 * 5.26 / 24 cpus = 219 "pt[i].cpu_percent" (which appears to represent 21.9% in monitese), which is accurate.

1000 / num_cpus is 41.6 on my box. since 219 >> 41.6 it gets cut back to 41.6.

Thanks,
-t

On Jan 5, 2012, at 4:33 AM, Martin Pala wrote:

Yes, Wayne is correct and the usage is computed exactly as he described. Monit takes the summary of all CPU cores as 100%.

Regards,
Martin

On Jan 5, 2012, at 10:54 AM, Lawrence, Wayne wrote:

May be wrong and i am sure someone will correct me if i am but it appears the way the cpu usage is worked out against the multiple cores is why you are getting this output.

The way i worked it out is the way i believe monit works it out and the maths sort of make sense.

24 cores 24 x 100% = 2400

so if you divide 2400 by your usage from top

2400 / 578 = 4.2

which would give you your percentage shown in monit.

Regards

Wayne

On 5 January 2012 08:13, Tom Pepper <address@hidden> wrote:

Hello:

I have a number of high-CPU processes that run on 24-core boxes configured e.g.:

check process emr-enc01-01 with pidfile /var/run/tada_liveenc_emr-enc01-01.pid

start program = "/usr/local/tada/launch.sh -c emr-enc01-01"

stop program = "/bin/bash -c 'kill -s SIGTERM `/bin/cat /var/run/tada_liveenc_emr-enc01-01.pid`'"

if totalmem > 80% then alert

if totalmem > 90% then restart

if totalcpu < 10% for 10 cycles then alert

These processes create pidfiles which match correctly in top as:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1710 root 20 0 3064m 1.2g 7808 S 578 15.8 47:31.53 tada_liveenc

1866 root 20 0 2954m 1.3g 7804 S 545 16.7 45:18.52 tada_liveenc

However, monit sees these as a completely different total CPU usage:

Process 'emr-enc01-01'

status Running

monitoring status Monitored

pid 1710

parent pid 1

uptime 8m

children 0

memory kilobytes 1372300

memory kilobytes total 1372300

memory percent 16.7%

memory percent total 16.7%

cpu percent 4.1%

cpu percent total 4.1%

data collected Thu, 05 Jan 2012 00:05:49

Process 'emr-enc01-02'

status Running

monitoring status Monitored

pid 1866

parent pid 1

uptime 8m

children 0

memory kilobytes 1362240

memory kilobytes total 1362240

memory percent 16.6%

memory percent total 16.6%

cpu percent 4.1%

cpu percent total 4.1%

data collected Thu, 05 Jan 2012 00:05:49

Any thoughts on why this might be happening? Hosts are ubuntu natty. The master processes themselves spawn about 150 threads (not forks).

FYI:

662 address@hidden: $ uname -m

x86_64

663 address@hidden: $ file `which monit`

/usr/local/bin/monit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped

664 address@hidden: $ monit -V

This is Monit version 5.3.2

Copyright (C) 2000-2011 Tildeslash Ltd. All Rights Reserved.

Thanks in advance,

-Tom

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

From:	Martin Pala
Subject:	Re: total cpu process bug?
Date:	Wed, 11 Jan 2012 20:01:47 +0100