[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#72165: 31.0.50; Intermittent crashing with recent emacs build
From: |
Eli Zaretskii |
Subject: |
bug#72165: 31.0.50; Intermittent crashing with recent emacs build |
Date: |
Thu, 18 Jul 2024 12:52:28 +0300 |
> From: Dima Kogan <dima@secretsauce.net>
> Cc: 72165@debbugs.gnu.org
> Date: Thu, 18 Jul 2024 00:25:14 -0700
>
> Here's what I see in the core dump:
>
> (gdb) p current_thread->m_current_buffer->text->z
> $22 = 32192
>
> (gdb) p current_thread->m_current_buffer->text->z_byte
> $23 = 32178
>
> (gdb) p current_thread->m_current_buffer->pt
> $24 = 32192
>
> (gdb) p current_thread->m_current_buffer->pt_byte
> $25 = 32178
>
> So that tells me that the failing condition isn't the one gdb flagged,
> but the one immediately after:
>
> if (BYTEPOS (opoint) < CHARPOS (opoint))
> emacs_abort ();
Yes.
> The compiler optimizations could be responsible for the discrepancy.
Yes, this happens frequently in optimized code.
> Am
> I understanding correctly that this check makes sure that BYTEPOS >=
> CHARPOS, which must always be true because sizeof(emacs character) is
> always >= 1byte?
Yes.
> The buffer name:
>
> (gdb) p current_thread->m_current_buffer->name_
> $26 = XIL(0x7fc685b24c1c)
>
> (gdb) xstring
> $27 = (struct Lisp_String *) 0x7fc685b24c18
> "*Messages*"
And the *Messages* buffer was displayed in some window when this
happened?
> The full structure:
>
> (gdb) p current_thread->m_current_buffer->own_text
> $45 = {
> beg = 0x561d7100f800 ...
> z = 32192,
> z_byte = 32178,
> gpt = 32191,
> gpt_byte = 32177,
That's the bug: in these two pairs, the character and byte values
should be identical.
The question is: which code modified Z and GPT without updating the
corresponding _BYTE variables, or the other way around?
> Let's look just at the last little bit, to count the bytes:
>
> (gdb) printf "%.200s\n", ¤t_thread->m_current_buffer->text->beg[32000]
> mail>" 1 9 (face mu4e-context-face help-echo "mu4e context: fastmail")) 10)
> Error during redisplay: (eval (mu4e--modeline-string) t) signaled
> (args-out-of-range "" 0) [5 times]
>
> I asked for at most 200 bytes (up to byte 32200). I got exactly 176
> bytes, so the text ends where the gap supposedly begins. That makes
> sense.
This means Z_BYTE and GPT_BYTE are correct, but the corresponding Z
and GPT values are incorrect.
My suggestion is to run Emacs under GDB with a watchpoint on Z_BYTE,
conditioned on the situation that Z_BYTE and Z are not equal.
This watchpoint needs to be defined when the current buffer is the
*Messages* buffer. One way of doing that is as follows:
$ gdb ./emacs
...
(gdb) break Frecenter
(gdb) run
After Emacs starts, type "C-x b *Messages* RET" to display *Messages*
in a window, then type C-l to trigger the Frecenter breakpoint, and
when GDB kicks in, type at the GDB prompt as follows:
(gdb) n
(gdb) n
(gdb) p buf
(gdb) watch $1->text->z_byte if $1->text->z_byte != $1->text->z
This relies on the fact that our code always changes Z_BYTE _after_
the suitable change to Z. The only exception to this rule that I
found is in insdel.c:del_range_2, where we do it in the opposite
order. So for the above to work, you need to edit that function and
transpose the line of code which modify Z_BYTE with the one which
modifies Z. Then rebuild Emacs and use the resulting binary to debug
this with the above watchpoint.
> Theory: there's a race condition between error handling that ends up
> writing to *Messages* and the logic that aggregates duplicated messages
> into things like [5 times].
I don't see how this could happen, for two reasons:
. emacs is a single-threaded program, so how can two pieces of code
that run in the same thread produce a race condition?
. in this particular case, both writing to *Messages* and aggregation
of identical messages happen in the same function, one after the
other; see xdisp.c:message_dolog.
> I saw the crashing once every 20min maybe, so reproducing it is probably
> possible, but not very quick and easy. Does it make sense to try to fix
> the (condition-case) problem first, since that's easily reproducible?
I don't see how fixing that problem could help. It might even
interfere, if that problem somehow triggers this one. Or did I miss
something?
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/17
- bug#72165: More info, Dima Kogan, 2024/07/17
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Eli Zaretskii, 2024/07/18
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/18
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build,
Eli Zaretskii <=
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/19
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/28
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Eli Zaretskii, 2024/07/29
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/29
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Eli Zaretskii, 2024/07/30
- bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Dima Kogan, 2024/07/30
bug#72165: 31.0.50; Intermittent crashing with recent emacs build, Jeremy Bryant, 2024/07/23