groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] groff_char(7): Combination of characters vs. single unicode


From: Werner LEMBERG
Subject: Re: [Groff] groff_char(7): Combination of characters vs. single unicode character
Date: Tue, 16 Dec 2014 04:33:55 +0100 (CET)

> Do i understand correctly that the Info manual calls u2260 invalid
> as a glyph name, but that, all the same, \[u2260] produces the
> desired output?
>
> And that groff contains a table to decompose u2260 into u003D_0338,
> but that, all the same, \[u003D_0338] will give you U+2260 in the
> output stream?  If so, what's the point in decomposing?
>
> If that is correct so far: Given that groff does not produce
> normalization form D in its output stream, why did you choose to use
> it for the documentation?  Wouldn't it be easier to understand if
> the normalization form used in the documentation matched the
> normalization form actually produced in the output stream?

Similar to TeX, the distinction between characters, entities, and
glyph names is unclear, unfortunately.

Here's the algorithm for converting an entity E (this is, the value in
the \[...] construct) to groff glyph name G.

  1. Compare E with the GGL (Groff Glyph List).  The GGL data is
     defined in `src/libs/libgroff/glyphuni.cpp' and listed in the
     `Input' column of `groff_char.man'.

       if (have_GGL_mapping)
         E1 = GGL(E)
       else
         E1 = E

  2. Decompose E1 to get Unicode normalization form D.  The
     decomposition data is defined in `src/libs/libgroff/uniuni.cpp'
     and listed in the `Unicode' columns of `groff_char.man'.

       if (have_decomposition)
         G = decomposition(E1)
       else
         G = E1

And here the algorithm how groff converts a groff glyph name G to an
output device's glyph name D (or glyph/char index, depending on the
device), to be found in the `Output' column of `groff_char.man'.

  a. Check whether G is present in the font.  Use it if available.

  b. Otherwise, try to map G to a `classical' groff glyph name.  This
     mapping is defined in `src/libs/libgroff/uniglyph.cpp'.

       if (have_classical_groff_glyph_name)
         D = classical_glyph_name(G)
       else
         D = G

So if you enter \[!=], groff converts `!=' to `u2260' (step 1), then
to `u003D_0338' (step 2).

For the `utf8' output device, `u003D_0338' is found in
`font/devutf8/R' (step a), returning character code U+2260 as the
final output.

For the `ps' output device, `u003D_0338' is not found, thus it gets
converted back to `!=' (step b), which is eventually found in file
`font/devps/S', returning PostScript glyph name `notequal'.


I hope this helps.  Patches to improve the docs are really welcome :-)


    Werner



reply via email to

[Prev in Thread] Current Thread [Next in Thread]