groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] groff_char(7): Combination of characters vs. single unicode


From: Werner LEMBERG
Subject: Re: [Groff] groff_char(7): Combination of characters vs. single unicode character
Date: Tue, 16 Dec 2014 01:19:06 +0100 (CET)

>> when there is a unicode character for e.g. "not equal" (U+2260)
>> why there is a combination of characters in groff_char(7)
>> instead of unicode?  Is it intended for ASCII output?
> 
>  3. In case you are talking about the third column "Unicode"
>     in said table, which contains "u003D_0338" even though
>     groff actually produces U+2260:
>     That looks like a documentation bug to me.  I'm not
>     sending a patch because there are many such composite
>     Unicode names in that column, so i suspect this is not
>     the only one mismatching reality.

It's rather a documentation bug.  From groff's Info manual, section
`Using Symbols':

   * A glyph representing more than a single input character is named

          'u' COMPONENT1 '_' COMPONENT2 '_' COMPONENT3 ...

     Example: 'u0045_0302_0301'.

     For simplicity, all Unicode characters that are composites must
     be decomposed maximally (this is normalization form D in the
     Unicode standard); for example, 'u00CA_0301' is not a valid glyph
     name since U+00CA (LATIN CAPITAL LETTER E WITH CIRCUMFLEX) can be
     further decomposed into U+0045 (LATIN CAPITAL LETTER E) and
     U+0302 (COMBINING CIRCUMFLEX ACCENT).  'u0045_0302_0301' is thus
     the glyph name for U+1EBE, LATIN CAPITAL LETTER E WITH CIRCUMFLEX
     AND ACUTE.

   * groff maintains a table to decompose all algorithmically derived
     glyph names that are composites itself.  For example, 'u0100'
     (LATIN LETTER A WITH MACRON) is automatically decomposed into
     'u0041_0304'.  Additionally, a glyph name of the GGL is preferred
     to an algorithmically derived glyph name; groff also
     automatically does the mapping.  Example: The glyph 'u0045_0302'
     is mapped to '^E'.

>From `groff_char.man', section REFERENCE, which explains the table
fields:

  Unicode
     is the glyph name used in composite glyph names.  The names in
     the Unicode column look like u0021 or u0041_0300.  In groff, the
     corresponding Unicode characters can be constructed by adding a
     backslash and a pair of square brackets, for example \[u0021] or
     \[u0041_0300].

The important bit is *glyph name*.  I've decided to use always use
Unicode normalization form D for glyph names, except there is a groff
entity name available, like \[!=] in the particular case, which is
preferred.

Patches are welcome to make this easier to understand in both
`groff.info' and `groff_char.man'.


    Werner



reply via email to

[Prev in Thread] Current Thread [Next in Thread]