groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Re: man page encoding


From: Andries Brouwer
Subject: Re: [Groff] Re: man page encoding
Date: Thu, 7 Jul 2005 23:20:16 +0200
User-agent: Mutt/1.4i

On Thu, Jul 07, 2005 at 10:55:35PM +0200, Bruno Haible wrote:
> Andries,
> 
> > It is not at all an unimportant detail whether it changes to utf-8 or
> > ascii with escape sequences. My own preprocessors halfway that pipeline
> > do not know about utf-8, and do not know about these escape sequences
> > either. Still I am told that compatibility mode should work.
> 
> ?? What is the relation between groff's compatibility mode and the
> possibility that your own preprocessors cannot handle non-ASCII troff input
> in either form?

Like this. The very long pipeline contains invocations of
refer, ideal, pic, tbl, eqn, ditroff
but also lots of preprocessors of my own. If the groff version of refer
or tbl decides to turn my Latin-1 into UTF-8, then my own preprocessors
later on in the pipeline will no longer be able to handle the input.
On the other hand, if they turn stuff into \[...] or \N[...] escape sequences,
then again my preprocessors are confused since this syntax is not traditional
troff syntax, and unexpected in the input.

Now you say "tough luck", and I don't mind, but if the idea is that groff
has a compatibility mode that allows one to handle old books, like my old
512-page monograph, then things do not improve with such recoding.

> > Is it not far simpler to document that groff must be called with a file
> > coded in ASCII or Latin-1 or UTF-8?
> 
> If we did this,
> 
>   1) We would need a heuristic to decide whether a given file is Latin1
>      or UTF-8. And "heuristic" is equivalent to "bug by design". Better
>      avoid it.

Ha - no heuristics! no guessing! completely agreed.
Fortunately no heuristics is needed. Since the current behaviour (I think)
is to assume Latin-1, it suffices to add an option to designate that
the input is utf-8. The option "input encoding" that has three possible values.

>   2) We would have low acceptance from the people who produce man pages in
>      EUC-JP, with the consequence that these "-Tnippon" hacks in groff
>      (or equivalent hacks in "man" in some distributions) would need to stay
>      forever.

But you talk as if you are forced to change groff in ugly ways because
man is set in stone. But it is very easy to change man.

Andries




reply via email to

[Prev in Thread] Current Thread [Next in Thread]