bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#69611: 30.0.50; Long bidi line with control characters freezes Emacs


From: Stephen Berman
Subject: bug#69611: 30.0.50; Long bidi line with control characters freezes Emacs
Date: Thu, 07 Mar 2024 14:42:37 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

This report is spun off from bug#69385 at the request of Eli Zaretskii,
because it concerns a problem that seems to be independent of that bug
report, though like it involves long lines of bidirectional text.

When I visited a certain elisp file generated by a program of mine and
type `M-v', it took some time (see below for details) for the display to
scroll to 4% from the top (according to the mode line) and then there
was no further change and Emacs froze, using 100% of a CPU core.  I
found no way to unfreeze it within Emacs and after about 15 minutes
terminated the emacs process from the shell.  This is reliably
reproducible with this file.

The file in question is only about 50k bytes long, but it contains one
line of more than 37k characters, consisting of a mix of ASCII and
non-ASCII characters, including properly shaped Arabic script.  The file
itself has base paragraph direction LTR.

Most of the Arabic words in this file are enclosed in the bidirectional
control characters POP DIRECTIONAL FORMATTING (#x202c) and RIGHT-TO-LEFT
EMBEDDING (#x202b).  I did not add these characters, but I had
copy-&-pasted most of the Arabic from a PDF file I did not create.  I
don't know if PDFs of Arabic text normally contain these control
characters, but the consequences for Emacs were dramatic.  When I simply
visited this file in Emacs (started with -Q) there was an immediate
slowdown, and in top I could see Emacs using 100% of a CPU thread.  I
ran `M-: (benchmark-run nil (end-of-buffer))' on this file, and the
result was:

(27.962602113 2 0.0226042269999999977)

This timing is from a build from master including the patch Eli posted
in bug#69385 (see
https://lists.gnu.org/archive/html/bug-gnu-emacs/2024-03/msg00101.html).
On a build without that patch, the benchmark timing is very much longer.

The display of the benchmark result only appeared in the echo area after
more than a minute (I timed it with a stopwatch).  At that point the
mode line showed the buffer at 4% from the top, and the display remained
frozen afterwards.  After several minutes during which Emacs consumed
100% CPU, and I had switched the focus away from the Emacs frame, the
CPU consumption stopped, but as soon as I switch focus back to that
frame, it went back to 100%.  The display never changed from showing the
buffer at 4%, apparently being in some kind of infinite loop.  After
about 15 minutes I started gdb, attached the Emacs process and produced
a backtrace, which I've attached, in the hope it helps to diagnose the
problem.

The problem seems to be certainly related the the bidirectional control
characters, because I made a copy of the file and removed all
occurrences of these control characters from it, and then ran the
end-of-buffer benchmark, getting this result (with Eli's patch):

(0.716104165 4 0.04223660400000001)

And the display updated normally and CPU consumption was normal.

Nevertheless, there seems to be something else besides the control
characters involved in this issue, because as a further test, I created
a buffer consisting of more than 1000 copies of the test string
concatenating the Arabic example in etc/HELLO and "Hello" (see bug#69385
for more on such test buffers), and manually enclosed each Arabic word
in the above control characters, but the benchmark result in this buffer
was not significantly different from the result without the control
characters (and similar to the above result for the copy of the
problematic file without the control characters), and the display did
not freeze.

(I have emailed a copy of the problematic file to Eli, at his request.
I do not want to post it publicly, because it contains hundreds of text
snippets from a PDF of a copyrighted book.  Each snippet is certainly
within the bounds of fair use for distribution, but in the sum probably
not.)


In GNU Emacs 30.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version
 3.24.38, cairo version 1.18.0) of 2024-03-04 built on strobelfs2
Repository revision: b3eb49a4661e31306555e82bdf24db6c36d67ad2
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101009
System Description: Linux From Scratch r12.0-112

Configured using:
 'configure -C --with-xwidgets 'CFLAGS=-Og -g3'
 PKG_CONFIG_PATH=/opt/qt5/lib/pkgconfig'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP X11 XDBE XIM XINPUT2 XPM XWIDGETS GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Attachment: txttOryfPG0ke.txt
Description: gdb backtrace


reply via email to

[Prev in Thread] Current Thread [Next in Thread]