bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#72165: 31.0.50; Intermittent crashing with recent emacs build


From: Eli Zaretskii
Subject: bug#72165: 31.0.50; Intermittent crashing with recent emacs build
Date: Thu, 18 Jul 2024 07:58:54 +0300

> From: Dima Kogan <dima@secretsauce.net>
> Date: Wed, 17 Jul 2024 13:56:27 -0700
> 
> I'm running a bleeding-edge build of emacs. Using packages from:
> 
>   https://emacs.secretsauce.net/
> 
> Debian GNU/Linux. GTK+. Currently using a build from git as of
> 2024/07/09 (8e46f44ea0e). It is crashing periodically, with an unclear
> cause.
> 
> This isn't a brand-new problem; I observed a similar crash with an earlier
> build: 2024/04/30 (d24981d27ce). After that crash I upgraded, and I see
> crashes still.
> 
> Anecdotally, the 2024/04/30 build has been very stable. Today I started
> to debug a different issue: something about mu4e modeline updating is
> signalling args-out-of-range. To debug this I'm tweaking functions like
> (truncate-string-to-width), and re-evaluating them. This debugging isn't
> very interesting, but something about it is causing emacs to crash, with
> both builds.

So when you say that "anecdotally, the 2024/04/30 build has been very
stable", what exactly do you mean?  It sounds like both that build and
the one from 2024/07/09 crash in the same way, so why do you consider
the April one "very stable"?

> I just made a core. I cannot xbacktrace because (I think) I'm looking at
> a core, and not at a live process. If that would be helpful, I can
> probably get that. And I see the crash every 20min maybe, while
> debugging the mu4e modeline problem. Below is the backtrace. Hopefully
> this speaks to somebody. Thanks!

Thanks, but please always try to supply the information that explains
the crash, not just the backtrace.  (In this case, it's a deliberate
abort, not a crash, but still.)  That means look at the source code
where GDB says the problem happens and print the values of the
variables involved in the crash.  In this case:

>   (gdb) bt full
>   #0  __pthread_kill_implementation (threadid=<optimized out>, 
> signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
>           tid = <optimized out>
>           ret = 0
>           pd = <optimized out>
>           old_mask = {
>             __val = {0}
>           }
>           ret = <optimized out>
>   #1  0x00007fc68a4a6b7f in __pthread_kill_internal (signo=6, 
> threadid=<optimized out>) at ./nptl/pthread_kill.c:78
>   #2  0x00007fc68a4584e2 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
>           ret = <optimized out>
>   #3  0x0000561d3dcb9798 in terminate_due_to_signal (sig=sig@entry=6, 
> backtrace_limit=backtrace_limit@entry=40) at ./debian/build-x/src/emacs.c:469
>   #4  0x0000561d3dcb9d4e in emacs_abort () at 
> ./debian/build-x/src/sysdep.c:2391
>   #5  0x0000561d3dcb6c34 in redisplay_window (window=<optimized out>, 
> just_this_one_p=just_this_one_p@entry=false) at 
> ./debian/build-x/src/xdisp.c:20086

The call to emacs_abort seems to be here:

  /* Some sanity checks.  */
  CHECK_WINDOW_END (w);
  if (Z == Z_BYTE && CHARPOS (opoint) != BYTEPOS (opoint))
    emacs_abort ();  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Now, your "bt full" doesn't help to understand what went wrong because
GDB is unable to find the values of many variables:

>           w = 0x561d6bcb2bc8
>           f = <optimized out>
>           buffer = <optimized out>
>           old = <optimized out>
>           lpoint = {
>             charpos = <optimized out>,
>             bytepos = <optimized out>
>           }
>           opoint = {
>             charpos = <optimized out>,
>             bytepos = <optimized out>
>           }

Still, at least Z and Z_BYTE should be available; what are their
values?

And regarding opoint, look back in the code a small ways to where it
was defined:

  SET_TEXT_POS (opoint, PT, PT_BYTE);

If you look up the definition of SET_TEXT_POS, you will see:

  /* Set character position of POS to CHARPOS, byte position to BYTEPOS.  */

  #define SET_TEXT_POS(POS, CHARPOS, BYTEPOS) \
       ((POS).charpos = (CHARPOS), (POS).bytepos = BYTEPOS)

which means opoint takes its character position from PT and its byte
position from PT_BYTE.  So if you print the values of PT and PT_BYTE,
we will know the ("optimized-out") values of opoint.charpos and
opoint.bytepos, and will probably be able to understand why we
aborted.  IOW:

  (gdb) frame 5
  (gdb) print Z
  (gdb) print Z_BYTE
  (gdb) print PT
  (gdb) pt PT_BYTE

(The "frame 5" command is to get to the callstack frame where we call
emacs_abort, shown as #5 at the right edge of the backtrace line.)

If GDB says it doesn't know about these variables with up-cased names,
like Z and PT_BYTE, it means your Emacs was built without macro
information (the -g3 compiler option), and you will need to type the
macro definitions instead.  For example (from buffer.h):

  #define PT (current_buffer->pt + 0)

So instead of "print PT" you will need to say "print current_buffer->pt".
And similarly with other variables above.

Next question is: what buffer did Emacs try to display?  To answer
that, print the name of the buffer that is current in this place in
the code:

  (gdb) print current_buffer->name_
  (gdb) xstring

If GDB says it doesn't know what "xstring" is, type:

  (gdb) source /path/to/emacs/src/.gdbinit

and then repeat the above 2 commands.

Once you know which buffer was being displayed, try to describe the
text that was in it, if you can.  (If you cannot, I can give
instructions how to find it out using GDB commands.)

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]