groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: man page encoding


From: Andries Brouwer
Subject: [Groff] Re: man page encoding
Date: Tue, 5 Jul 2005 23:43:56 +0200
User-agent: Mutt/1.4i

On Tue, Jul 05, 2005 at 07:41:13PM +0200, Bruno Haible wrote:

> Andries,
> 
> Currently on a Linux system you find man pages in the following encodings:
>   - ISO-8859-1 (German, Spanish, French, Italian, Brasilian, ...),
>   - ISO-8859-2 (Hungarian, Polish, ...),
>   - KOI8-R (Russian),
>   - EUC-JP (Japanese),
>   - UTF-8 (Vietnamese),
>   - ISO-8859-7, ISO-8859-9, ISO-8859-15, ISO-8859-16 (man7/*),
> and none of them contains an encoding marker.
> 
> The goal is that "groff -T... -mandoc" on any man page works, without
> need to specify the encoding as an argument to groff.
> 
> There are two options:
>   a) Recognize only UTF-8 encoded man pages. This is the simplest.
>      groff will be changed to emit errors when it is fed a non-UTF-8
>      input, so that the man page maintainers are notified that they need to
>      convert their man page to UTF-8.
>   b) Recognize the encoding according to a note in the first line
>         '\" -*- coding: EUC-JP -*-
>      groff will then emit errors when it is fed input that is non-ASCII and
>      without coding: marker, so that man page maintainers are notified that
>      they need to add the coding: marker.
> 
> Which of the two would you, as Linux man pages maintainer, prefer?
> 
> Bruno

Hi Bruno,

(1) About my status: I have done these jobs for nine years or so, but in recent
times I passed on maintenance both of man-*, the invoker of *roff, and of the
man-pages package to other people. So you get my opinion not as a maintainer
but as a private person.

Also, the man-pages maintainer does not maintain the translations, and does
not have to worry about character sets. (Well, only a little. All pages are
in ASCII except for the iso-8859-*.7 and similar pages.)

(2) You say: `The goal is that "groff -T... -mandoc" on any man page works,
without need to specify the encoding as an argument to groff'.

(2A) This will work in simple cases, where input encoding and output encoding
and system character set are equal. I have seen man pages in Greek and Polish
many years ago.

(2B) Maybe this does not have to work - the requirement is that "man ls" works,
not that "groff [options] ls.1" works. Now man can invent the appropriate 
options.
For example, man also handles the .so directives, and may invoke a browser or
something other than groff.

(3) There are a few precedents for locale information handled by man.

(3A) man.conf contains the default invocation, like
        /usr/bin/nroff -Tlatin1 -mandoc
or
        /usr/bin/groff -Tnippon -mandocj
(3B) translated man pages are found by man in directories with special 
pathnames.
(3C) The iconv hack mentioned earlier today used a charset file in the
directory to indicate the character set of all man pages in that directory.

(4) Yes, character set information in a man page would be desirable.
But it is bad to require it. There is a system default (as in 3A)
in case no more specific information is known.
Putting the info on the first line of the file is a bad idea.
Many things want to be on the first line.
(The .so directive, the 't and 'e directives, etc.)

(-) In short: the system-wide convention (you would choose UTF-8
but I know people who would choose KOI-8) we have already, it is (3A).
Marking man pages with character set info is something we don't do today.
It is a possibility, but for a long time old man pages without marking
must be accepted. The man program (and/or groff) can react to the
user's locale settings.
Since almost all translations are produced by national translation teams
working via the Montreal translation robot, the rules are rather uniform,
and it will not be very difficult to introduce new rules.

Andries




reply via email to

[Prev in Thread] Current Thread [Next in Thread]