emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh


From: Pip Cet
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Wed, 27 May 2020 09:36:52 +0000

On Tue, May 26, 2020 at 7:46 PM Eli Zaretskii <address@hidden> wrote:
> > From: Pip Cet <address@hidden>
> > Date: Tue, 26 May 2020 18:13:55 +0000
> > Cc: address@hidden, address@hidden, address@hidden
> >
> > > Assuming that the alternative for selecting the "context" is found,
> > > and composite.c is augmented to apply it instead of the regexps, why
> > > not use the rest of the automatic composition code to produce the
> > > glyphs and display them?
> >
> > I chose not to do that for a patch which I have stated repeatedly was
> > not in any way a finalized design, and I don't see any good reason to
> > do it for a real patch, either, so far.
>
> Why not?

Which part are you asking about? I don't see any good reason because
I've read the composite.c code (I'm not ignoring it), with an eye to
reusing what's reusable, and come up empty.

But you've convinced me I need to do a careful rereading.

> > > The code which does that exists and works,
> >
> > (I suspect: slowly)
>
> Any measurements to back that up?

Yes. With a regexp of "....", the composite.c code takes 175 billion
cycles to display every line of composite.c. My code takes 144 billion
cycles, with a lookahead/lookbehind each set to 128 but limiting it as
described.

> E.g., is scrolling through
> etc/HELLO especially slow, once all the fonts were loaded (i.e. each
> character in the file was displayed at least once)?

> (And why are you using Emacs 26 and not
> Emacs 27, where we support HarfBuzz and made several improvements and
> bugfixes in the character composition area?)

Because I was trying to test your implication that all this was usable
years ago. It wasn't. I'm not using Emacs 26 :-)

> > > It already solves the problems of look-ahead,
> >
> > If it does so efficiently, I'll certainly try reusing that code. But I
> > strongly suspect it doesn't.
>
> Why suspect? why not try and see what does and doesn't work, what is
> and isn't efficient?

I have, now, coming up with the above measurement which confirms my suspicion.

> > > and others, including (but not limited to) the dreaded bidi thing.
> >
> > Looking for "bidi" in composite.c, the only relevant thing I see is a FIXME.
>
> That's because you look in the wrong place.

What's the right place? I'm using all the code in bidi.c, of course,
so as far as I can tell what I'm not doing is using composite.c...

> Once again, try looking
> at etc/HELLO, there are portions of it that need both bidi and
> compositions.  I can explain how it works (the code is spread over
> several files), but please believe me that it does, it passed the
> HarfBuzz developers' eyes most of whom are native Arabic and Farsi
> speakers, and wouldn't allow us to display Arabic script incorrectly.
>
> The whole point of using the existing code is that you don't _need_ to
> understand how exactly we handle the bidi reordering when character
> compositions are required.

But that's true without using the existing code!

> It just works, for all you care.  It did
> take several iterations to get right at the time; why would you want
> to repeat all that, when the code is there to use and extend?

> > second, precisely because it works well for the purposes of others,
> > and I'd like to have as little impact as possible on existing use
> > cases. They should just continue working, and so far they do.
>
> You are thinking of breaking those other cases by your changes?

No! If I break them, that's a severe bug in my code!

> But
> we haven't yet established that changes are needed,

"Enter"ing ligature glyphs is definitely something we need to do
before any user can reasonably use variable-pitch fonts with ligatures
for displaying English text. Kerning is another such thing. Both don't
work with the current code.

> Because the features you are talking about should "just work" in
> Emacs.

> Not only for some use cases and some scripts -- that is not
> how we develop features.  Features that work only for some cases are
> broken and will draw bug reports.  They make Emacs look unclean and
> unprofessional.

Not as much as the current lack of support does.

> And there's no need to add such half-broken features because code that
> supports much broader class of use cases already exists, you just need
> to use it and maybe extend and augment it a bit.

I don't think I agree with the "a bit".

> > The code shouldn't break horribly for RTL text (it doesn't).
>
> It _will_ break for RTL text, you just didn't yet see it because you
> only tested it in simple use cases.  UAX#9 defines a lot of optional
> features, including multi-level directional overrides and embeddings,
> it isn't just right-to-left vs left-to-right.

I assume bidi.c handles that, as it does for composite.c?

> > > What's more, we already have the code which implements all
> > > that, so I don't understand why you want to bypass it.
> >
> > We have something that superficially results in a similar screen
> > layout to what I want, but that actually represents display elements
> > in a way that makes them unusable for my purposes.
>
> Then please describe what doesn't fit your purpose, and let's focus on
> extending the existing code to do what's missing.

The three main things are:
 - "entering" glyphs, instead of treating them as atomic
 - providing context automatically rather than by providing specific
regexps for it in advance
 - kerning, which requires context for every character

Secondary concerns:
 - ligatures that come partly from a display property and partly from
the buffer (composite.c doesn't allow for those, as far as I can tell)

> Please note: I'm not talking about the regexp part -- that part you
> anyway will need to decide how to extend or augment.  I'm telling you
> right here and now that blindly taking a fixed amount of surrounding
> text will not be acceptable.  You can either come up with some smarter
> regexp (and you are wrong: the regexps in composition-function-table
> do NOT have to match only fixed strings, you can see that they don't
> in the part of the table we set up for the Arabic script);

Again, I think the limits are fixed: 4 characters of history and 500
characters of look-ahead. What am I missing?

> or you can
> decide on something more complex, like a function.  Either way, the
> amount of text that this will pick up and pass to the shaper should be
> reasonable and should be determined by some understandable rules.  And
> those rules must be controllable from Lisp.

That last part isn't true for the composite.c code, which imposes a
limit of 4 characters of history and 500 characters of look-ahead, as
far as I can tell. But, sure, if that's a requirement, I'll keep it in
mind.

> But that is a separate part of the problem that you will need to
> solve, and you will need to solve it whether or not you use character
> compositions.  What I _am_ saying is that the rest of the machinery
> that implements automatic compositions does exactly what you need: it
> calls the shaper, handling LTR and RTL text as needed, then lays out
> the glyphs the shaper returns in a way that handles all the usual
> stuff our users expect, such as line wrapping and truncation.
> It is silly to disregard that code, so please don't.

You've convinced me that it's worth reading it again, more carefully,
but I'm not optimistic I'll come to a different conclusion this time
around.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]