bug#38748: 28.0.50; crash on MacOS 10.15.2

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38748: 28.0.50; crash on MacOS 10.15.2

From:	Pip Cet
Subject:	bug#38748: 28.0.50; crash on MacOS 10.15.2
Date:	Fri, 10 Jan 2020 09:22:30 +0000

On Fri, Jan 10, 2020 at 8:27 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > I can. I think we're looking at two bugs: the first is the simple
> > use-after-free of XFRAME (frame)->output_data.ns where `frame' is a
> > dead frame. I've confirmed on GNU/Linux that mark_frame is called for
> > a frame for which x_free_frame_resources has already been called, if
> > there's a global variable still referencing the frame. I think the
> > same thing happens on macOS.
>
> This one doesn't depend on the 'ok's initialization in
> face_inherited_attr in any way, does it?

It doesn't, no.

> What do you mean by "secondary thread"?

It's my impression that macOS forces us to run in several threads,
even though we don't really want to do so. For example, changeFont in
nsterm.m appears not to assume it's run on the main thread, but calls
build_string, which sounds dangerous to me.

> And how can GC modify Lisp
> data structures? that'd be a terrible bug.

Yes, it would be, but if bug#2 is real it's going to be terrible in
one way or another (I hope it's not GC-related, but "just" a stack
overflow).

> In any case, the full backtrace shows no trace of face_inherited_attr
> call anywhere in the callstack, so if there is indeed infinite
> recursion in that function, it was somehow exited long ago by the time
> GC runs.

I don't think the full backtrace is bug#2, it's bug#1.

> As for the tail-recursion part: do you see any sign of that in the
> disassembly posted by Robert?

No, just in the backtrace which shows execution at xfaces.c:2226, with
the PC not saved in the stack frame.

> I didn't, but maybe I missed
> something.  And such subtleties should only rear their ugly heads in
> optimized code, whereas we already know that an unoptimized build
> crashes in the same way.

Do we, though? We know that an unoptimized build crashes, but we don't
know it's the (hypothetical, as I said) bug#2.
>
> I still think the shortest way to finding the culprit here is to
> patiently and painfully go over the last_marked array, deciphering
> the Lisp object we marked, until we succeed in identifying the Lisp
> data structure which got corrupted.  Once we succeed in identifying
> that data structure, it should be relatively easy to find who and
> where corrupts it.  This may mean a lot of inconvenient drudgery,
> exacerbated by the fact that having a functional GDB on macOS is not
> easy, but I don't think we have a better way at this point.

I disagree. The patch to nsterm.m is obviously harmless, and appears
to fix the one bug we have clear evidence of, in a way that seems
logical and necessary to me.

If there is a second bug, and the backtrace we saw wasn't just a
fluke, it's going to show up when people run emacs on macOS in gdb in
all-stop mode. The problem is I think that hardly ever happens, and I
don't have access to a macOS machine.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#38748: 28.0.50; crash on MacOS 10.15.2, (continued)

Prev by Date: bug#38748: 28.0.50; crash on MacOS 10.15.2
Next by Date: bug#38748: 28.0.50; crash on MacOS 10.15.2
Previous by thread: bug#38748: 28.0.50; crash on MacOS 10.15.2
Next by thread: bug#38748: 28.0.50; crash on MacOS 10.15.2
Index(es):
- Date
- Thread