groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] address@hidden: Man_db UTF-8 issues]


From: Colin Watson
Subject: [Groff] address@hidden: Man_db UTF-8 issues]
Date: Fri, 4 Jan 2002 16:30:39 +0000
User-agent: Mutt/1.3.24i

I got the following query about UTF-8 man pages. Can anybody comment? I
can't immediately see a way to make devutf8 to map \(hy and \(mi into
different characters for nroff and troff (or even for grotty and
everything else), but I'm definitely not as familiar with the code as
some.

Apologies for the differing encoding between my mail and Michael's. As
he guessed, I'm not up to using UTF-8 mail yet.

Thanks,

-- 
Colin Watson                                  address@hidden

----- Forwarded message from Michael Piefel <address@hidden> -----

Envelope-to: address@hidden
Date: Fri, 4 Jan 2002 11:37:21 +0100
From: Michael Piefel <address@hidden>
To: Colin Watson <address@hidden>
Subject: Man_db UTF-8 issues
User-Agent: Mutt/1.3.24i
X-MIME-Autoconverted: from 8bit to quoted-printable by 
mail.informatik.hu-berlin.de id g04AbMU02897

Hello Colin,

I'm still trying to display man pages in UTF-8. I patched man-db a
little, but now I had a quick look at the changes in CVS and they are
probably better than mine.

However, now that I see a manpage in UTF-8 I noticed a problem which
affect all manpages. _All_ manpages are wrong! Or perhaps nroff is; or
man; or getopt?

The problem lies in the character minus or dash or whatever you call it.
In ASCII, you always use the same character, namely - (ascii 45). This
is called a HYPHEN-MINUS in UTF-8 because it cannot be decided whether
it is a HYPHEN or a MINUS. This character is used eg. in mail addresses
(...contact address@hidden) and options (--verbose).

However, nroff renders an input character ??-?? as ???????, that is a HYPHEN.
It also renders ??\-?? as ???????, that's a MINUS. So far, so good, nroff is a
document formatting system.

Unfortunately this means that the abovementioned mail address is
rendered as address@hidden and the options, which are usually
entered ??\-\-verbose?? as ??????verbose. Those renderings look quite right,
but aren't. You can't copy and paste them, for example.

Of course getopt could accept HYPHEN, HYPHEN-MINUS and MINUS. But that's
not a very good solution, methinks.

Whenever the generated formatted manpage is not immediately destined for
any kind of print output (like dvi or ps), it would probably be best to
generate HYPHEN-MINUS for both ??-?? and ??\-??. This could be an nroff
change. It could also be another filter just behind nroff which
transforms MINUS and HYPHEN back to HYPHEN-MINUS. (Note that the
??sthetically pleasing short hyphens at the hyphenation points nroff
inserts are not affected as those are SOFT HYPHENs.)

All this is food for thought and not high priority as copy&paste is not
a function of manpages which you can't do without.

Well, this is my first mail in UTF-8 and I expect it to fail badly on
your side... Please tell me if I should send it again in a more
widespread encoding.

Bye,
    Mike

-- 
|=| Michael Piefel                    address@hidden
|=| Humboldt-Universit??t zu Berlin              http://www.piefel.de
|=| Tel. (+49 30) 2093 3831

----- End forwarded message -----

reply via email to

[Prev in Thread] Current Thread [Next in Thread]