monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Load alert failed during auditd backlog


From: Martin Pala
Subject: Re: Load alert failed during auditd backlog
Date: Mon, 31 Oct 2016 08:14:40 +0100

Hello TJ, thank you for data.

The problem is, that monit updates the state file after each test cycle with blocking write ... when the filesystem freezes, monit is blocked by the state file update and wasn't able to continue monitoring until the filesystem becomes writable again. I have created an issue to track this problem, we'll fix it: https://bitbucket.org/tildeslash/monit/issues/493/state-file-refactor-to-non-blocking-read

Workaround is to place the statefile to a filesystem which won't freeze (tmpfs should work), you can set the location using "set statefile" statement:

set statefile /run/monit.state

Best regards,
Martin



On 27 Oct 2016, at 22:10, TJ Stroker <address@hidden> wrote:

Hello Martin,

Just emailed to you.

Thank you!

On Thu, Oct 27, 2016 at 7:09 AM, Martin Pala <address@hidden> wrote:
Hello,

please can you send you Monit configuration and Monit log to address@hidden?

The status messages were most probably send to M/Monit, otherwise the chart will have a gap,for example:

<PastedGraphic-1.png>



Best regards,
Martin


On 26 Oct 2016, at 22:04, TJ Stroker <address@hidden> wrote:

Hello

I wanted to ask a questions, or point out an issue... whichever fits

Yesterday afternoon I noticed an odd issue with a server, which just happened to be running monit 5.19. The issue had actually been in effect for a couple of days. I use m/monit, but never had received any alerts on this issue. The issue is highlighted in this RedHat TID

Message "audit: backlog limit exceeded" reported and possibly hung system due to a frozen filesystem



What I found when I ssh'd to my server was that I had a system load of 299. However monit and m/monit both showed a load of almost 0. I will attach an m/monit weekly load graph. 

This server is not used for anything but internal, so it didn't create any real problems for us. But it could have been something more important.

At this point (as I'm still digging into the auditd issue) I can only think that somehow, due to the freeze, monit was unable to queue messages out. And because of this I had no error condition on m/monit. 

So I wanted to point it out, but also ask if there might be some insight on how to catch this type of issue in the future.


Jim
<Screen Shot 2016-10-26 at 11.21.59 AM.png>--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general


reply via email to

[Prev in Thread] Current Thread [Next in Thread]