groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: man page encoding


From: Bruno Haible
Subject: [Groff] Re: man page encoding
Date: Wed, 6 Jul 2005 13:50:31 +0200
User-agent: KMail/1.5

Hi Andries,

Thanks for the details.

> (2) You say: `The goal is that "groff -T... -mandoc" on any man page works,
> without need to specify the encoding as an argument to groff'.
>
> (2A) This will work in simple cases, where input encoding and output
> encoding and system character set are equal.
> ...
>       /usr/bin/groff -Tnippon -mandocj

The input encoding and the output encoding are often different. For
example, when a user in a ja_JP.UTF-8 locale views a man page in EUC-JP
encoding. The output device is -Tutf8 in this case.

The problem with "-Tnippon" is that it needs to specify a particular
output device in order to cope with input in EUC-JP.

> (3A) man.conf contains the default invocation, like
>       /usr/bin/nroff -Tlatin1 -mandoc

This is bad: The encoding of the output should be determined by the
user's current locale, not hardcoded in a configuration file.
Get rid of this line in man.conf!

> (2B) Maybe this does not have to work - the requirement is that "man ls"
> works, not that "groff [options] ls.1" works.

No, the goal is really that "groff [options] ls.1" works. When a
translator or man page author wants to view a man page, s/he should
be able to do so without installing the file in particular directories.

> (3C) The iconv hack mentioned earlier today used a charset file
> in the directory to indicate the character set of all man pages in that
> directory.

That's bad, because the meaning of the file changes depending on which
directory it sits in. "groff [options] ls.1" needs to work without
referring to other files in the same directory.

> (4) Yes, character set information in a man page would be desirable.
> But it is bad to require it.

Why? HTML requires it. XML requires it. We require it in PO files, and there
it's a life saver. Emacs requires it in many files, in order to display the
file correctly.

> Putting the info on the first line of the file is a bad idea.
> Many things want to be on the first line.
> (The .so directive, the 't and 'e directives, etc.)

When there's a .so directive, you don't need to specify the encoding.
When there's 't and 'e directives, the comment with -*- coding -*-
can come after it, without disturbing groff's determination of the
preprocessors to be run.

> (-) In short: the system-wide convention (you would choose UTF-8
> but I know people who would choose KOI-8) we have already, it is (3A).

Sorry, this needs to go away. Hardcoding output encodings in a configuration
file is a no-no.

> The man program (and/or groff) can react to the user's locale settings.

Yes, that's the way to go.

> Since almost all translations are produced by national translation teams
> working via the Montreal translation robot, the rules are rather uniform,
> and it will not be very difficult to introduce new rules.

Thanks, then let's go for the proposed
   .\" t  -*- coding: EUC-JP -*-
syntax.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]