monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Solaris 11 memory usage


From: Martin Pala
Subject: Re: Solaris 11 memory usage
Date: Thu, 23 Oct 2014 20:41:00 +0200

Thanks for data.

We use kstat to get freemem statistics ... it contains both freelist and 
cachelist. In your case the output of mdb ::memstat shows, that the memory 
usage was ca. 85% (7+6+10+1+61 = 85), which matches the monit test limit (> 
80%). The memory usage is real, check the "sr" (page scanner activity) in 
vmstat to see if it's problem for the system.

If the high memory usage is normal, you can adjust the test limit to suppress 
the alerts, you can also use the "for X cycles" option to alert only if the 
memory usage remains high for long time, for example:

        if memory usage > 90% for 20 cycles then alert

Regards,
Martin


> On 23 Oct 2014, at 15:36, Nestor Urquiza <address@hidden> wrote:
> 
> Thanks a lot for this Martin.
> 
> Here is what I got (using version 5.7)
> ___________ Thu Oct 23 01:15:10 EDT 2014 ___________
> 
> Page Summary                Pages                MB  %Tot
> 
> ------------     ----------------  ----------------  ----
> 
> Kernel                     588196              2297    7%
> 
> ZFS File Data              480826              1878    6%
> 
> Anon                       820560              3205   10%
> 
> Exec and libs               49531               193    1%
> 
> Page cache                5125284             20020   61%
> 
> Free (cachelist)          1009189              3942   12%
> 
> Free (freelist)            314893              1230 
> 
> 
> 
> The main culprit is a process used by a vendor product:
> 
>    PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP      
> 
> 
>  10592 geneva     37G   20G cpu5     20    0   0:10:32  11% newaga/1
> 
> 
> 
> This machine has 32GB RAM so at first glance someone would say we either 
> increase memory or ask the vendor to provide some guidance on how to limit 
> memory usage by that process. 
> 
> However I am wondering if "page cache" should really be alarming? According 
> to Oracle https://blogs.oracle.com/rmc/entry/the_vm_system_formally_known 
> "The cachelist operates as part of the freelist. When the freelist is 
> depleted, allocations are made from the oldest pages in the cachelist. This 
> allows the file system page cache to grow to consume all available memory and 
> to dynamically shrink as memory is required for other purposes."
> 
> In this case the newaga command is part of a replication script which brings 
> an in memory database from a remote server locally. This in memory database 
> works with memory segments that are replicated in disk and loaded as needed. 
> This system can even work with 16GB RAM. We increased it because we were 
> getting too many alerts from monit. In Solaris 10 (with the previous version 
> of the same software) we used to have no memory alerts from monit using 16GB 
> RAM, same database, or kind of because of course we changed both the OS and 
> the version of the app.
> 
> Bottom line I am now trying to understand if monit should be reporting memory 
> usage in a different way for Solaris 11 or the vendor should be using memory 
> in a different way or Solaris should be tweaked to please alerts.
> 
> 
> 
> Under normal operation BTW this is what we get:
> 
> > ::memstat
> 
> Page Summary                Pages                MB  %Tot
> 
> ------------     ----------------  ----------------  ----
> 
> Kernel                     585743              2288    7%
> 
> ZFS File Data              861077              3363   10%
> 
> Anon                       793486              3099    9%
> 
> Exec and libs               45752               178    1%
> 
> Page cache                 259302              1012    3%
> 
> Free (cachelist)          4301112             16801   51%
> 
> Free (freelist)           1542007              6023   18%
> 
> 
> Total                     8388479             32767
> 
> 
> 
> Thanks again for your help with this!
> 
> - Nestor
> 
> 
> On Thu, Oct 23, 2014 at 5:24 AM, Martin Pala <address@hidden> wrote:
> You can use the prstat exec action too, just remove the "-s rss" option to 
> let it sort the output by CPU usage (default)
> 
> Regards,
> Martin
> 
> 
>> On 22 Oct 2014, at 18:58, Nestor Urquiza <address@hidden> wrote:
>> 
>> Thanks for this Martin,
>> 
>> I will keep you posted now that I installed 5.7 and put the command in 
>> monitrc as recommended.
>> 
>> We are also getting some alerts for CPU usage spikes. Do you have a 
>> recommendation for the command to run when getting those as well?
>> 
>> Thanks!
>> - Nestor
>> 
>> On Wed, Oct 22, 2014 at 3:33 AM, Martin Pala <address@hidden> wrote:
>> Hi Nestor,
>> 
>> you can use something like this to get the distribution (will record the 
>> memstat output + user space distribution ... processes by RSS):
>> 
>>         if memory usage > 80% then exec "/bin/sh -c 'exec >> 
>> /tmp/memstat.$$; echo ___________ `date` ___________; echo ::memstat | sudo 
>> mdb -k; prstat -c -s rss 1 10'"
>> 
>> 
>> There was fix for memory usage report for Solaris in Monit 5.7 ... please 
>> can you upgrade to Monit 5.9? If the problem will persist - is the system 
>> where Monit is running 32-bit or 64-bit? Is it the Solaris zone?
>> 
>> 
>> Regards,
>> Martin
>> 
>> 
>> > On 20 Oct 2014, at 22:04, Nestor Urquiza <address@hidden> wrote:
>> >
>> > Hi Martin,
>> >
>> > Is there a way to put monit in debug mode so we get more information about 
>> > the memory distribution at the moment of the alert?
>> >
>> > One thing we have noticed is that regardless how many cycles we wait to 
>> > alert, the succeed message comes in the next cycle after the alert which 
>> > is really weird.
>> >
>> > Thanks,
>> >
>> > - Nestor
>> >
>> > On Sun, Oct 19, 2014 at 12:32 PM, Nestor Urquiza <address@hidden> wrote:
>> > I am sorry about the examples but yes we do get memory utilization spikes:
>> >
>> > "mem usage of 82.6% matches resource limit [mem usage>80.0%],"
>> >
>> > It is difficult to get that information at the time of the alert though. 
>> > Is there a way to put monit on debug mode or something to get exactly the 
>> > memory utilization distribution?
>> >
>> > Right now everything is alright:
>> >
>> > $ sudo monit status
>> >
>> > ...
>> >
>> > System 'server'
>> >
>> >   status                            Running
>> >
>> >   monitoring status                 Monitored
>> >
>> >   load average                      [0.13] [0.12] [0.11]
>> >
>> >   cpu                               0.3%us 1.4%sy 0.0%wa
>> >
>> >   memory usage                      11822268 kB [35.2%]
>> >
>> >   swap usage                        0 kB [0.0%]
>> >
>> >   data collected                    Sun, 19 Oct 2014 12:23:47
>> >
>> > ...
>> >
>> >
>> >
>> > $ echo ::memstat | sudo mdb -k
>> >
>> > Page Summary                Pages                MB  %Tot
>> >
>> > ------------     ----------------  ----------------  ----
>> >
>> > Kernel                     591587              2310    7%
>> >
>> > ZFS File Data             1089502              4255   13%
>> >
>> > Anon                       999345              3903   12%
>> >
>> > Exec and libs               50239               196    1%
>> >
>> > Page cache                 249081               972    3%
>> >
>> > Free (cachelist)          3821104             14926   46%
>> >
>> > Free (freelist)           1587621              6201   19%
>> >
>> >
>> > Total                     8388479             32767
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > - Nestor
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Oct 18, 2014 at 4:22 PM, Martin Pala <address@hidden> wrote:
>> > Hi,
>> >
>> > the attached error message ("cpu system usage ...") is for CPU test ... 
>> > not related to memory usage. High "cpu system" usage may be for example 
>> > sign of heavy disk I/O activity and/or swapping (memory shortage) - check 
>> > vmstat output for details.
>> >
>> > If the memory usage report is problem, please can you provide output of 
>> > "echo ::memstat | mdb -k" and "monit status" (just the System service part 
>> > is sufficient).
>> >
>> >
>> > Regards,
>> > Martin
>> >
>> >
>> >
>> > > On 16 Oct 2014, at 16:41, Nestor Urquiza <address@hidden> wrote:
>> > >
>> > > Hi guys,
>> > >
>> > > Since we went from Solaris 10 to 11 we have seen an increase monit 
>> > > alerts related to memory resource utilization. We used to get no alerts 
>> > > even when we set the memorty threshold really low, for example:
>> > >
>> > > "...cpu system usage of 45.8% matches resource limit [cpu system 
>> > > usage>40.0%]"
>> > >
>> > >
>> > > We have incremented the threshold to 90% but still we get alerts.
>> > >
>> > > Could it be that the way monit decides what is free memory in Solaris is 
>> > > incorrect when using ZFS 
>> > > http://serverfault.com/questions/378392/how-should-i-monitor-memory-usage-performance-in-sunos-solaris
>> > >
>> > > We are running monit version 5.5 BTW which has been working fine for 
>> > > ages.
>> > >
>> > > Perhaps version 5.9 has done something in that regard as I read the 
>> > > release notes ( http://mmonit.com/monit/changes/ ) are allowing to 
>> > > monitor generic device strings (not related really but worth to ask).
>> > >
>> > > Thanks!
>> > >
>> > > - Nestor
>> > >
>> > > --
>> > > To unsubscribe:
>> > > https://lists.nongnu.org/mailman/listinfo/monit-general
>> >
>> >
>> > --
>> > To unsubscribe:
>> > https://lists.nongnu.org/mailman/listinfo/monit-general
>> >
>> >
>> > --
>> > To unsubscribe:
>> > https://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> 
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]