[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] groff_char(7): Combination of characters vs. single unicode
From: |
Werner LEMBERG |
Subject: |
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character |
Date: |
Tue, 16 Dec 2014 01:19:06 +0100 (CET) |
>> when there is a unicode character for e.g. "not equal" (U+2260)
>> why there is a combination of characters in groff_char(7)
>> instead of unicode? Is it intended for ASCII output?
>
> 3. In case you are talking about the third column "Unicode"
> in said table, which contains "u003D_0338" even though
> groff actually produces U+2260:
> That looks like a documentation bug to me. I'm not
> sending a patch because there are many such composite
> Unicode names in that column, so i suspect this is not
> the only one mismatching reality.
It's rather a documentation bug. From groff's Info manual, section
`Using Symbols':
* A glyph representing more than a single input character is named
'u' COMPONENT1 '_' COMPONENT2 '_' COMPONENT3 ...
Example: 'u0045_0302_0301'.
For simplicity, all Unicode characters that are composites must
be decomposed maximally (this is normalization form D in the
Unicode standard); for example, 'u00CA_0301' is not a valid glyph
name since U+00CA (LATIN CAPITAL LETTER E WITH CIRCUMFLEX) can be
further decomposed into U+0045 (LATIN CAPITAL LETTER E) and
U+0302 (COMBINING CIRCUMFLEX ACCENT). 'u0045_0302_0301' is thus
the glyph name for U+1EBE, LATIN CAPITAL LETTER E WITH CIRCUMFLEX
AND ACUTE.
* groff maintains a table to decompose all algorithmically derived
glyph names that are composites itself. For example, 'u0100'
(LATIN LETTER A WITH MACRON) is automatically decomposed into
'u0041_0304'. Additionally, a glyph name of the GGL is preferred
to an algorithmically derived glyph name; groff also
automatically does the mapping. Example: The glyph 'u0045_0302'
is mapped to '^E'.
>From `groff_char.man', section REFERENCE, which explains the table
fields:
Unicode
is the glyph name used in composite glyph names. The names in
the Unicode column look like u0021 or u0041_0300. In groff, the
corresponding Unicode characters can be constructed by adding a
backslash and a pair of square brackets, for example \[u0021] or
\[u0041_0300].
The important bit is *glyph name*. I've decided to use always use
Unicode normalization form D for glyph names, except there is a groff
entity name available, like \[!=] in the particular case, which is
preferred.
Patches are welcome to make this easier to understand in both
`groff.info' and `groff_char.man'.
Werner
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Ted Harding, 2014/12/15
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Carsten Kunze, 2014/12/15