emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GC bug investigation


From: Daniel Colascione
Subject: Re: GC bug investigation
Date: Sun, 23 Mar 2014 08:22:22 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 03/23/2014 07:57 AM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
>     Details of the objects on the path might be useful.
> 
> I don't understand "on the path".
> 
>     mark_object(A)
>     mark_vectorlike(B)
>     mark_object(B)
>     mark_object(clear-transient-map)
> 
> Right.
> 
>     B here is clear-transient-map's function cell, right? You're saying you
>     saw that it's a pseudovector that safe_debug_print reports as
>     INVALID_LISP_OBJECT, probably because live_vector_p returns 0.
> 
> Yes.
> 
>      That
>     we're reaching B at all indicates that it shouldn't be dead.
> 
> I guess so.  This is the mysterious part.
> 
>     B must have been made dead *before* being assigned to
>     clear-transient-map's function cell. Looking at the bytecode in
>     set-transient-map, though, I don't see how that's possible.
> 
> I don't think that's what happened.  If it were that, we would
> see crashes when that code tries to _use_ the value legitimately.

...unless we're GCing before the value is used.  Keep in mind that we'll
only try to use the value before the next command runs. It sounds
far-fetched, but I don't have a better idea.

> 
>     clear-transient-map isn't dead either,
> 
> It has not been freed, it seems, but it may be garbage.
> 
> It is being marked through a spurious pointer randomly hanging around
> in a stack slot for something else.  We don't know that there is any
> real pointer to it.

Conservative GC is designed to cope with occasional stray pointers into
the GC heap. That we're somehow finding a pointer to the symbol is not
the problem. mark_maybe_pointer marks an object at an address only if
mem_find() and live_XXX_p() indicate that the address holds a live object.

Now, it's conceivable that there might be a bug in the liveness
detection, but if there were, I'd expect to see it manifest much more
frequently and on many more platforms. Collecting garbage is pretty much
the main thing Emacs does. :-)

Besides: looking at the commits during the range you gave, I don't see
anything that might suggest that we broke the GC itself.

That's why I'm curious about Ffset: if there's a window between the time
the function object is created and the time it's assigned to the
symbol's function cell during which time the function value isn't
reachable from a GC root, then it's possible that we're occasionally
GCing during that period, freeing the function object, then assigning it
to the symbol's function slot. The only place I can imagine that
happening is inside FFset. The GC code *should* be spilling all
non-volatile registers onto the stack for examination, but I imagine the
MIPS version of this code is lightly tested. Maybe unrelated code
changes triggered some kind of code rearrangement that made it more
likely to encounter this condition.

Anyway, if, when we crash, we're able to see the stack captured at the
last time that vector was freed, we should have a much better idea of
what's going on. I can work on adding that instrumentation.

> 
>     I don't think that writing code that aborts or breaks when a particular
>     vector is freed will be very helpful; we'll hit that code in normal
>     operation too. Instead, it'll probably be more useful to print a
>     backtrace (using emacs_backtrace) each time we see that vectorlike
>     freed, then look at the last backtrace before the GC crash.
> 
> Maybe you are right.
> 
>     Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?
> 
> I can, but it would be a big pain.  It takes many hours to recompile
> Emacs on this machine.

Cross-compile?

> What would it tell us?  It would confirm that the vectorlike was freed,
> perhaps, but do we doubt that?

I doubt everything here.

> If that hassle is likely to solve the problem, I'll do it,
> but I'd rather not go to that hassle just to confirm what we know.

If we can combine that recompilation with some other debugging
instrumentation, the hassle will be worthwhile.

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]