gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] recent armel/armhf kernels ENOMEM on mprotect, maxima/ax


From: Camm Maguire
Subject: Re: [Gcl-devel] recent armel/armhf kernels ENOMEM on mprotect, maxima/axiom FTBFS
Date: Thu, 28 Aug 2014 12:36:29 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Greetings, and thanks so much for your feedback!

Ian Campbell <address@hidden> writes:

> On Tue, 2014-08-19 at 18:50 -0400, Camm Maguire wrote:
>
>> mprotect failure: 0xd49000 305430528 : Cannot allocate memory
>> sgc disabled: Cannot allocate memory
>
> I've now repro'd this (on a system running 3.10-0.bpo.2-armmp). (it the
> dumps me to some gcl prompt which I don't seem to be able to exit ;-))
>
> Experimentally it seems like the benign change you referred to was the
> addition of __stack_chk_guard=random_ulong()? I think the enablement of
> stack protection is very far from benign in this context, since it plays
> various memory management tricks and adds guard pages etc.
>

This is only benign because it moved up an identical call currently
in the code several lines down, which works fine as placed.  When the
call is made earlier, the sequence of allocations is slightly different,
and the brk/mprotect issue shows up.  Mention has already been made that
running under gdb alters the memory layout enough to make the problem go
away.  So almost all cases are working correctly.

I do not believe this has anything to do with the stack guard.  We set
our own only because as an external *variable* reference to a shared
library, this will trigger a COPY_ relocation in ld, which will then be
in the wrong place after unexec (like emacs), and therefore prevent
prelinking.  All this was just to close a bug reporting that programs
compiled with gcl could not be prelinked, which just might make startup
a little faster.  The stack protector flags are part of Debian policy in
any case.


> I've tried to to write a simplified test case but my (rather too basic)
> attempts don't seem to be doing the trick. TBH I think you are going to
> need the assistance of someone who knows more about
> gcc/glibc/stack-protector than me, right now I don't think the kernel
> angle is the one most likely to produce fruit.

I can't really see why writing to __stack_chk_guard could possibly
affect a call to mprotect, and in any case, mprotect fails before
random_ulong() returns and __stack_chk_guard is modified.  While not
being familiar with the kernel sources, it seems like mprotect, perhaps
on arm only, is requiring some internal table memory which might be
proportional to the size of the block, or some other factor which
prevents it from satisfying the number of requests ostensibly permitted
in the /sys variable, which name escapes me at the moment (nmaps?)

If this is a corner case, writing a simplified test might be tricky,
unless perhaps one just has main make the exact sequence of brk and
mprotect calls reported by strace.


>
>> and when run under strace -f will show the brk calls and mprotect calls
>> which fail.
>
> They seem to be quite large (hundreds of MB) compared to the other such
> calls -- is that expected?


I think if you grep for brk and mprotect, it is not too large.  But yes,
the full dump is huge.

Another factor of note is that gcl *probes* brk at startup to determine
the maximum amount of heap available, and then resets to what is actually
needed at the moment.  This is because the allocation algorithm
attempting to gracefully handle out of memory situations obviously
depends on this quantity.  (Of course with OOM killers, this is really
just a heuristic, as successful brk offers no *guarantee* that you
actually have the memory :-).)

>
>>   Oddly enough, when run under gdb, something is done to the
>> runtime environment which prevents the failure from occurring, a mystery
>> to me.
>
> FWIW I get a SEGV.
>

This is correct.  The command has created an image that uses a 'stratified
garbage collector', which marks pages read-only, traps subsequent writes
and remarks the page read-write, in an effort to shrink the number of
pages which must be processed in garbage collection.  When running under
gdb, 'handle SIGSEGV nostop noprint' followed by 'b error' to trap 'real
segfaults is usually required.

The startup sequence in such an image is 1) first check that faults can
be trapped on the running cpu/kernel, and then 2) go through the heap
dividing the pages into read-only and read-write blocks that were
determined before image save.  If memory serves, this process succeeds,
but when an random number is generated later for the stack protector, a
sequence of brk/mprotect calls is triggered that shows the failure.

> (gdb) bt
> #0  0x0007f348 in memprotect_test () at sgbc.c:957
> #1  0x00084258 in do_memprotect_test () at sgbc.c:1000
> #2  memprotect_test_reset () at sgbc.c:1019
> #3  0x000b271e in gcl_init_alloc (cs_start=<optimized out>) at alloc.c:1090
> #4  0x00021914 in main (argc=1, argv=0xbefffd14, envp=0xbefffd1c) at 
> main.c:357
>
> The call to gcl_init_alloc is before "__stack_chk_guard=random_ulong()",
> but these functions seem to relate to memprotect. Not sure what is going
> on there, but perhaps the stack trace gives you a clue.
>

Thanks again so much.  I'm not really sure what the priority on this is,
or if its even a 'bug' as mprotect is allowed to fail with ENOMEM.  It
was just surprising that with only ~ 100 maps, it would fail on memory
successfully allocated by brk.  So this report is really just offered to
the 'powers that be' to decide if this is not the intent of their
implementation, as gcl is moving forward with a workaround.  Needless to
say, I'm happy to help if anyone wants to pursue this further.

Take care,


> Ian.
>
>
>
>
>

-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]