groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Groff, Grohtml and Encodings


From: Werner LEMBERG
Subject: Re: [Groff] Groff, Grohtml and Encodings
Date: Sun, 17 Oct 2010 16:16:32 +0200 (CEST)

> [...]  When I process this file using the following MSDOS batch
> script
> 
>    type %1 | groff -mkoi8-r -t -Thtml > %2
> 
> groff outputs six (one per each symbol) warning messages of the
> form:
> 
>    stdin:1: warning: can't find special character '<SYMBOL>',

These warning messages are harmless.  Reason is that grohtml processes
the input twice: One time with -Thtml for text and a second time with
-Tps for everything which grohtml can't handle.  This second run
causes the warning messages.

Untested: If you set up your system with Cyrillic PS fonts, you
shouldn't get these warnings.  You might use the `.fam' request within
your document (which grohtml ignores) to select the proper PS fonts.

Admittedly, this is badly documented if at all.  I would be glad if
you could provide patches to improve that.

>   1.  I tried to define glyphs for the characters reported in the
>       abovementioned warnings, in the ...\font\devhtml\r file like
>       this:
> 
>          u041F 24 0 0x041F,
> 
>       but this did not affect either the output or the warning
>       messages.

You should do this in /font/devps/...  However, this is probably a bad
idea.

>   2.  Why did the last warning mention the composite character
>       u0438_0306 instead of the original u0439, to which it is
>       mapped by the koi8-r.tmac file?

The list of Unicode composites is hard-coded into groff (in file
src/libs/libgroff/uniuni.cpp); composites get always decomposed.  This
is documented in the groff info manual.

>   3.  I saw the line "unicode" in the ...\font\devhtml\desc file,
>       but the description of the DESC format does not mention the
>       possibility of such a line.  What does it do?

It is documented both in groff_font(5) and the groff info manual:

       unicode
              Indicate that the output device supports the complete
              Unicode repertoire.  Useful only for devices which
              produce character entities instead of glyphs.

              If unicode is present, no charset section is required in
              the font description files since the Unicode handling
              built into groff is used.  However, if there are entries
              in a charset section, they either override the default
              mappings for those particular characters or add new
              mappings (normally for composite characters).

              This is used for -Tutf8, -Thtml, and -Txhtml.

>   4.  How to set up groff to accept koi8-r-encoded files and output
>       html pages
> 
>         a. with the same ecoding,
>         b.  with the UTF8 encoding?

This is not possible.  grohtml always emits Unicode encoded data.


    Werner



reply via email to

[Prev in Thread] Current Thread [Next in Thread]