emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: emojis and other multi-character glyphs


From: Evgeny Zajcev
Subject: Re: emojis and other multi-character glyphs
Date: Sun, 26 Dec 2021 13:41:21 +0300



вс, 26 дек. 2021 г. в 13:15, Eli Zaretskii <eliz@gnu.org>:
> From: Evgeny Zajcev <lg.zevlg@gmail.com>
> Date: Sun, 26 Dec 2021 12:43:34 +0300
>
> There is some inconsistency in naming and behaviour in Emacs master.
> We have `forward-char', `backward-char', `delete-char', `backward-delete-char' commands.  All of them use
> "char" in their names, however, `forward-char' and `backward-char' treats "char" differently than
> `delete-char' and `backward-delete-char'.
>
> Let me explain.  Emacs has support for composed characters to display multiple characters composed into
> a single glyph.  Almost the same is done for multi-character emojis such as 🇷🇺 or 👨‍👩‍👧‍👦 - multiple
> unicode chars are composed into single glyph representing some emoji.  Now, if you put point under
> composed character or emoji and run `forward-char' or `backward-char' it moves point to the whole glyph,
> however, if you run `delete-char' (when point is under composed char) or `backward-delete-char'(when
> point just after the glyph) it will delete only single character from multiple character representation, so
> pressing `C-d' under 🇷🇺 will magically turn Russian flag into 🇺.  This is very misleading behaviour
> especially when invisible characters are used in the emojis

Emacs had in the past a feature whereby the user could move and delete
by single codepoints in composed character sequences.  This feature
was somehow lost.  I'm trying for some time to determine how and why
it was lost, and how to restore it.  So this issue is known and is in
the works, albeit slowly.

Ah, I see, nice, I'll try to debug this as well to help you


> Maybe introduce "glyph" term meaning graphical representation of chars sequence, displayed in the buffer
> and operated as a whole thing?

There's no need for that, because we can provide dwim-ish operation
for existing commands without any new terminology or new commands.

Yeah, if "char" consistency will be restored then there is no need for "glyph" introduction.  I just thought that this is some new feature that chars and glyphs are treated differently.
 

> And also it will be possible to write something like `string-glyph-length' to return 1 for "👨‍👩‍👧‍👦" instead of 7
> as `length' returns now.

Why would that be useful?

Sometimes it is useful to know real string length before acting on it.  In my case, I use a service that has limitation on number chars it can act on and emojis are counted as single char.  Anyway, having something like `emoji' text-property (as analogue to `composition' text property for composed chars) will be very useful for different use-cases

--
lg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]