bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70000: 29.2; Grapheme handling incorrect


From: Eli Zaretskii
Subject: bug#70000: 29.2; Grapheme handling incorrect
Date: Mon, 25 Mar 2024 21:35:24 +0200

tags 70000 notabug
thanks

> From: Phillip Susi <phill@thesusis.net>
> Date: Mon, 25 Mar 2024 14:45:48 -0400
> 
> I had some terminal breakage the other day when browsing email with
> notmuch.  Now a ways down the rabbit hole, it seems this is because
> emacs does not correctly handle graphemes.  I found this article here:
> 
> https://mitchellh.com/writing/grapheme-clusters-in-terminals
> 
> If I paste that gramehe into GUI emacs, it is displayed as two separate
> characters, each two columns wide, instead of the correct way: as a
> single double wide character.

First, the above blog talks about text-mode terminals (a.k.a. "TTYs"),
so it is not relevant to GUI Emacs session.

And second, how that particular sequence of codepoints is displayed on
GUI frames depends on how your Emacs was built.  According to the list
of features included in your report, viz.:

  Configured features:
  ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD
  MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND
  THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB

your Emacs is built without HarfBuzz, which I think explains why your
Emacs displays the above sequences as 2 separate characters.
Furthermore, the appearance depends on the fonts you have installed;
specifically, Emoji sequences need a font that has a good support of
the Emoji Unicode blocks.  In my Emacs, which does use HarfBuzz, I see
a single grapheme cluster.

> C-f and C-b move over the character as if
> it were one, however, backspace deletes only the second, leaving both
> the first and the zero width joiner.  If C-f and C-b treat it as one,
> then so should backspace.

That Backspace deletes a single codepoint is a feature: it allows
easier editing of composable character sequences, such as Emoji.
E.g., imagine you want to make a slight change to the Emoji by
modifying just the second of the two characters composed into a
grapheme cluster.  Emacs supports deletion of the entire grapheme
cluster with the command delete-forward-char, by default bound to the
<Delete> function key.

> Under recent versions of the foot terminal emulator, this character is
> displayed as a single, double wide character, but emacs assumes it still
> is 4 colums wide, leading to terminal breakage.

Emacs cannot know what the terminal does with these characters,
because there's no widely-accepted protocol for accessing that
information.  Different terminal emulators behave differently, and
some even have options to modify their behavior via the various
settings.

> Emacs needs to not assume the width of graphemes are what wcwidth()
> reports, but instead need to query the cursor position after
> printing one to find out how wide the terminal actually dispalyed it
> as.

Querying the cursor position won't help in this case because it is
Emacs that moves the cursor when you type C-f, not the terminal.

I see no Emacs bug here.  Until we have standard ways of querying
text-mode terminals about their processing of composable character
sequences into grapheme clusters, there's no way for Emacs to behave
correctly with all such terminal emulators.  Sorry.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]