groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Some thoughts on glyphs


From: Werner LEMBERG
Subject: [Groff] Some thoughts on glyphs
Date: Fri, 19 Apr 2002 08:55:44 +0200 (CEST)

Dear friends,


the integration of the URW fonts into groff turned out to be
non-trivial.  Well, I could do a simple job by substituting just the
font names, but I want to do it better.

All URW fonts contain about 88 glyphs more than the corresponding
standard PS fonts.  I want to make them accessible also.  But how to
do that easily?  Or rather, how to do it correctly?  Contemplating on
this topic, I believe we need a generalization of the glyph accessing
mechanism.

It doesn't make sense to invent more and more glyph names.  Following
the AGL (Adobe Glyph List) algorithm is probably the best we can do;
this is, having glyph names like `uni2317.ugly', where `uni2317' is
Unicode character U+2317, and the suffix after the comma the glyph
variant.  The exact syntax is rather irrelevant since we won't parse
the glyph names, but it will help the experienced user to prepare font
definition files, and it can be easily integrated into the afm2dit
tool.  Such glyph names should be long enough to avoid possible
clashes with user-defined entities.

Anyway, we need a mechanism to map the input character codes to
glyphs in a simple and efficient way.

My main concern in this mail is not Unicode (which causes a bunch of
different problems) but the current situation with various 256
character encodings.  Let us analyze latin-2.  The first few entries
of the range 0xA1..0xFF look like this (the second column is the
Unicode value):

  0xA1    0x0104  #       LATIN CAPITAL LETTER A WITH OGONEK
  0xA2    0x02D8  #       BREVE
  0xA3    0x0141  #       LATIN CAPITAL LETTER L WITH STROKE
  0xA4    0x00A4  #       CURRENCY SIGN
  0xA5    0x013D  #       LATIN CAPITAL LETTER L WITH CARON
  0xA6    0x015A  #       LATIN CAPITAL LETTER S WITH ACUTE
  ...

As mentioned earlier, I've completely removed the hard-coded
dependency on latin-1 in groff.  It should now be possible, similar to
latin1.tmac, to write a latin2.tmac file:

  .de latin2-tr
  .  trin \\$1\\$1
  .  if c\\$2 .if !c\\$1 .trin \\$1\\$2
  ..
  .
  .latin2-tr \[char161] \[uni0104]
  .latin2-tr \[char162] \[ab]
  .latin2-tr \[char163] \[/L]
  .latin2-tr \[char164] \[Cs]
  .latin2-tr \[char165] \[uni013D]
  .latin2-tr \[char166] \[uni014A]
  ...

It is not too difficult to define glyph `uni0104' by composing it with
`A' and an ogonek accent (using the .char request) in case the font
doesn't contain this glyph.  For latin-2 encoded files it is also easy
to access this glyph by writing the character code 0xA1, but how to
input that glyph independently of the encoding, using only ASCII?

In LaTeX, the natural answer would be `\k{A}' -- \k is a macro which
takes `A' as a parameter, converting it directly to an Aogonek glyph
if available, putting an ogonek accent below the A glyph otherwise.
In troff, I don't see an immediate answer.  The latter solution is
available, i.e., putting the ogonek accent below the `A' (using \z or
something similar in case both glyphs are spacing), but the former
isn't possible, except by defining a string for each possible
ogonek-baseglyph combination.

Thus I suggest to extend the \[...] escape:

  \[<base glyph> <accent glyph 1> <accent glyph 2> ...]

(I still have to check the Unicode standard how to handle the order of
accents.)  For our ogonek example, this would look like this:

  \[A ho]

The entity names within \[...] are taken from the font definition
files.

Now, gtroff tries to synthesize a composite glyph name as follows.

  1. It checks whether there is a predefined gtroff glyph name for the
     (A,ho) pair (none in our example).  It also checks all possible
     aliases from the font file.  For example, if \[S ~] is given for
     -Tps, it also looks up \[S a~] since there are two entity names
     for the tilde glyph in the font definition files.

  2. If that fails, it now converts it to the glyph name to
     `uni004102DB' which is an algorithmically derived name for a
     composite glyph, following the AGL guidelines.  A modified
     version of the afm2dit script would have generated exactly this
     name for the `Aogonek' glyph.

  3. If that also fails, there is apparently no Aogonek glyph
     available.   gtroff then emits `\z\[ho]A', forcing all
     accents to have zero-width.

The weakest point is 3., but I don't see a better alternative
currently without heavily modifying the font format (the depth field
in gtroff font files should never be negative -- this changes the
original bounding box information, making it impossible to properly
stack accents on base glyphs).  Maybe I shall add an option to control
whether step 3 is executed at all?

Now latin2.tmac would look like this:

  .de latin2-tr
  .  trin \\$1\\$1
  .  if c\\$2 .if !c\\$1 .trin \\$1\\$2
  ..
  .
  .latin2-tr \[char161] \[A ho]
  .latin2-tr \[char162] \[ab]
  .latin2-tr \[char163] \[/L]
  .latin2-tr \[char164] \[Cs]
  .latin2-tr \[char165] \[L ah]
  .latin2-tr \[char166] \[S aa]
  ...


Comments, please.


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]