emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh


From: Eli Zaretskii
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Sat, 23 May 2020 16:04:54 +0300

> Date: Sat, 23 May 2020 13:24:12 +0200
> From: Vasilij Schneidermann <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden,
>       address@hidden
> 
> Out of curiosity, is this the same reason why font fallback is
> handled on a per-script basis for most cases and with carefully
> chosen ranges for emoji?  I see a similar problem there, with
> updates being necessary for every Unicode release.

No, our font selection machinery is completely separate from text
shaping, and is also agnostic to character compositions.  Basically,
we have a char-table (the one set-fontset-font manipulates) which
provides the various fonts to try for every given character, and some
very convoluted code (see fontset.c) that implements the logic of how
to try the fonts and which fonts to prefer for a character.  IOW, the
font selection is basically per-character and not per-script.

The relation to emoji is that emoji _sequences_ need character
composition, and Emacs currently cannot compose characters that aren't
supported by the same font.  This _is_ related to ligatures etc., as
it indeed touches on one of the basic premises of the display engine's
iteration through buffer text: we stop wherever the 'face' property of
characters changes (and the font is one attribute of the face), then
continue after loading and realizing the new face.  This is why you
see strange artifacts when you press and hold Shift, and then move
with arrow keys across the Arabic line in etc/HELLO: the shaping of
adjacent characters breaks because we pass only part of the text to
the shaper.  This is another bug that cannot be fixed cleanly while
keeping the current design of the display engine and its low-level
method of iteration through text and of producing glyphs.

> Given your previous explanation, a regex-based approach heuristic is the best
> we can hope for then.  From what I understand the display engine uses a
> rectangular grid, not unlike what terminal emulators do.

It uses a rectangular array of glyphs, not a rectangular grid.  The
difference is that glyphs can have variable metrics, which breaks the
grid concept.  IOW, the glyph at coordinates (i, j) in the array and
the glyph at (i, j+1) are not necessarily one above the other on
display.

> Are there any tricks
> to steal from existing terminal emulators?  For example there is an open pull
> request [1] for alacritty using Harfbuzz and FreeType for ligature support.

I cannot claim I understood well enough what this attempts to do, but
I don't think this is our problem in Emacs.  It is not a problem of
layout per se -- Emacs is well equipped to deal with layout of glyphs
and grapheme clusters that have wildly different metrics (recall that
we are able to lay out images of more-or-less arbitrary dimensions on
the same line as simple text).  The problem is that we make the layout
decisions as soon as we have the glyph metrics, on the fly, for each
"thing" we need to display.  HarfBuzz people would like us to send
them the entire paragraph of text, then get it back as a series of
glyphs, then make the layout decisions based on that.  This would need
entirely different algorithms, if not also different data structures;
for starters, we'd need to know how to find the paragraph(s) that will
end up on display without first trying to display them.  And all our
redisplay shortcuts and optimizations implicitly also assume the
current basic iteration, one character at a time, which can be started
at any arbitrary buffer position.

> The greatest challenge I see with redesigning the display engine is supporting
> textual terminals.

Really?  Why do you think this to be the greatest challenge?  For any
model of the display we will come up, TTY frames will always be a
proper subset, no?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]