groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Groff, Grohtml and Encodings


From: Anton Shepelev
Subject: [Groff] Groff, Grohtml and Encodings
Date: Thu, 14 Oct 2010 19:59:09 +0400

Hello all,

I thought I had solved all encoding problems until I
tried to export my documetns into the  HTML  format.
It  seems  that  my  understanding of how groff maps
input charactes into its internal charactes and then
into  output  glyphs  is  incomplete.  Below  I have
described what I was doing and what results I got.

I have a KOI8-R encoded file that has the  following
letters, in the hex notation:

   F0, C5, D2, D7, D9, CA

I  am  using  the koi8-r.tmac file, which maps these
letters as follows:

         ----------------------------------
         Char hex   Char dec   Mapped char
         ----------------------------------
         F0         240        \[u041F]
         C5         197        \[u0435]
         D2         210        \[u0440]
         D7         215        \[u0432]
         D9         217        \[u044B]
         CA         202        \[u0439]
         ----------------------------------

The values in the third  column  match  the  Unicode
codes  for  the corresponding letters of the Russian
language. When I process this file using the follow-
ing MSDOS batch script

   type %1 | groff -mkoi8-r -t -Thtml > %2

groff outputs six (one per each symbol) warning mes-
sages of the form:

   stdin:1: warning: can't find special character '<SYMBOL>',

Where <SYMBOL> sequentially  assumes  the  following
values:

   u041F, u0435, u0440,
   u0432, u044B, u0438_0306,

which  is exactly what the corresponding input char-
acters map to except for the last one, which  turned
into a composite code for a reason unknown to me.

The resulting html file looks quite correct and con-
tains the following:

   <p>&#1055;&#1077;&#1088;&#1074;&#1099;&#1081;</p>

These decimal values correspond to the values of the
internal characters in the table above.

The  -mkoi8-r  does work correctly, as I have tested
by removing it.

Here's what I do not understand and I would appreci-
ate your help with:

  1.  I  tried  to  define glyphs for the characters
      reported in the  abovementioned  warnings,  in
      the ...\font\devhtml\r file like this:

         u041F 24 0 0x041F,

      but  this  did not affect either the output or
      the warning messages.  Aren't  these  warnings
      about missing glyphs in the font file? If they
      are, then why didn't my  defining  the  glyphs
      for those characers work?

  2.  Why did the last warning mention the composite
      character u0438_0306 instead of  the  original
      u0439,   to   which   it   is  mapped  by  the
      koi8-r.tmac file?

  3.  I   saw   the   line    "unicode"    in    the
      ...\font\devhtml\desc  file,  but the descrip-
      tion of the DESC format does not  mention  the
      possibility of such a line. What does it do?

  4.  How  to  set up groff to accept koi8-r-encoded
      files and output html pages

        a.  with the same ecoding,
        b.  with the UTF8 encoding?

Thank you in advance,
Anton



reply via email to

[Prev in Thread] Current Thread [Next in Thread]