bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70000: 29.2; Grapheme handling incorrect


From: Eli Zaretskii
Subject: bug#70000: 29.2; Grapheme handling incorrect
Date: Wed, 27 Mar 2024 19:17:39 +0200

> From: Phillip Susi <phill@thesusis.net>
> Cc: 70000@debbugs.gnu.org
> Date: Wed, 27 Mar 2024 10:11:30 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Querying the cursor position won't help in this case because it is
> > Emacs that moves the cursor when you type C-f, not the terminal.
> 
> I'm not talking about C-f, but simply displaying the characters on the
> screen.  Emacs assumes the width is 4 when it prints this character, and
> so it thinks that the cursor moved over 4 places.  When the terminal
> actually only moves the cursor over 2 spaces, emacs gets out of sync
> with the terminal, and massive breakage occurs.

I understand what you are saying, but this is not how Emacs display
code works.  It needs to know the width of every character displayed
on the screen, and it needs to be able to determine that even without
actually displaying the character.

When Emacs is about to redraw some portion of the screen, it moves the
cursor to that place.  To be able to move the cursor there, it needs
to be able to compute the coordinates on the screen of every character
that is currently shown, so it can construct the command for the
terminal driver to move cursor to that place.  If Emacs were to rely
on displaying characters for that, it would have needed to constantly
redraw large portions of the screen, and that would both be much
slower and cause unpleasant flickering of the display, due to
redrawing of screen portions that don't actually change.

So this technique is out of the question for Emacs.

> By reading back the cursor position from the terminal after displaying a
> grapheme cluster, it would learn how the terminal displayed it and
> update its idea of where the cursor is correctly.

I understand.  But Emacs needs this information also long after the
characters were already drawn.  For example, imagine that Emacs
displays these characters on the screen, and then leaves most of the
screen intact and periodically redraws some small portion of the
screen, like updating current time in the lower-right corner of the
screen when Emacs is otherwise idle.  To do that, Emacs needs to move
the cursor from its current position somewhere on the screen to the
lower-right corner, redraw the time there, then move the cursor back
to where it was.  These cursor moves are based on the ability to
calculate the geometry of each character on display without actually
writing the characters to the screen.

In addition, if Emacs had to query the cursor position after each
written character, its redisplay would be much slower than it is now.

> I originally ran into this problem not with a ZWJ, but with an emoji
> followed by alternate selector 16 that someone used in a subject line of
> an email, and when browsing my inbox with notmuch, the terminal went
> FUBAR.

Yes, that's a known issue with some of the terminal emulators that
compose Emoji and other similar character sequences into grapheme
clusters, while ignoring the width that is expected from the result.
I'm not aware of any good solution, unfortunately.  Sometimes,
disabling auto-composition-mode helps, but even that cannot solve all
the problems, especially when each of the characters composed by the
terminal into a single grapheme cluster has non-zero width according
to the Unicode tables.  (If only the first character in the composed
sequence has non-zero width and the rest are zero-width, disabling
auto-composition-mode might produce a correct display.)

The bottom line is what I said at the beginning: we need some protocol
by which a terminal emulator could be queried about whether it
supports character composition, and if so, what is the screen width of
a given sequence of codepoints that will be composed, without actually
displaying them.  Better yet, some standard table of such widths could
be accepted by complying terminal emulators, and then Emacs could use
such a table to know the width in advance (similarly to how it knows
that from the Unicode data files).

Until such protocols or tables exist, Emacs will be unable to produce
correct display on these terminal emulators.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]