groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] ASCII Minus Sign in man Pages.


From: Ingo Schwarze
Subject: Re: [Groff] ASCII Minus Sign in man Pages.
Date: Mon, 24 Apr 2017 16:39:15 +0200
User-agent: Mutt/1.6.2 (2016-07-01)

Hi,

i think it is clear due to Ralph's extensive analysis that this
whole thing is a mess:  Even looking at groff only, for historical
reasons, the input sequences

  -   \-   \(hy   \(mi   \(en

are handled differently across output devices and across macro sets,
so even using current groff alone, no consistent way exists to get
the equivalent of the Unicode HYPHEN-MINUS character, even though
that is important for manual pages.

Besides, writing manual pages absolutely needs to be simple.  Manual
pages must be written by programmers who may not know typography and
who are not prepared to, and shouldn't be required to, acquire
specialized knowledge just to write the required manuals together
with their code.  So even if we would come up with some elaborate
recommendation about hyphens/dashes/minuses in manual pages, it
would be useless because it wouldn't be followed in practice.

While i consider the above a serious issue, i'm much less worried
about Ralph's concern with old implementations.  Frankly, there are
only three practically relevant roff implementations that are widely
used for manual pages: groff, Heirloom, and mandoc.  The maintainers
are all active on this list and cooperate well.  So we have the
chance to decide something that is simple and implement it everywhere,
even if it diverges somewhat from historical practice.


To understand my following proposal, observe this:

First, in contrast to classical typography, we need four rather
than the usual three output characters (using Unicode names for
clarity without intending to imply that Unicode is used as the
character set by each output device):

 1. U+2010  HYPHEN
 2. U+2013  EN DASH
 3. U+2212  MINUS SIGN
 4. U+002D  HYPHEN-MINUS

The latter doesn't exist in normal typography, but is required for
programming and hence for manual pages.

In cases where you are not concerned about copy and paste but want
a particular typographic representation, no matter whether in a
manual page or in some other document, you can use the escape
sequences

  \(hy   \(mi   \(en

already now.  OUTSIDE manual pages, you can also use - for \(hy
(and you usually will do that) and you can use \- for \(mi (though
i probably wouldn't recommend that; it mostly exists for historical
reasons).

INSIDE manual pages, - for \(hy or \- for \(mi is a terrible idea
already now because the three main implementations (including groff)
don't do that in the quite important -Tutf8 device.


So here is what i propose.

Let's not change anything (neither code nor recommendations) for
typesetting OUTSIDE manual pages, unless there are bugs in devices.

INSIDE manual pages (both -man and -mdoc), let's change - and \-
to always map to U+002D HYPHEN-MINUS for all devices and let's tell
people to simply use - for HYPHEN-MINUS and stop worrying.  Those
who care and are aware of such subtleties can use \(hy \(mi \(en
in running text in manuals, but 95% of manual page authors probably
won't, and that's not a problem at all.


This proposal has two downsides, but i consider both very minor
compared to the gain, which is having a consistent way to get U+002D
HYPHEN-MINUS in manual pages, and having a very simple rule that
has very good chances to actually be followed in practice and make
all this easily understandable for the future.

First minor downside for manual pages: Hyphens in running
text that are given as - will be rendered as HYPHEN-MINUS for
all devices.  But that's a very minor regression because that's
the case for the most important devices (ascii, utf8, html)
already now.  (Note that i'm not saying that utf8 is more
important than ps/pdf in general - only for manual pages.)

Second minor downside: Hyphen-minus signs in code elements that are
given as - (which we will then encourage!) may render as U+2010
HYPHEN on some legacy systems.  But that's an even smaller issue.
Which legacy systems are there in the first place?  Which of them
support anything except ascii and latin-1?  Who uses them?  Will
the users get upset about seeing hyphens in such cases?  I suspect
the answers are "very few, almost none, almost nobody, no".  And
if they do get upset, it will be easy for them to update their
software to follow groff's lead.


Assuming this is considered the right direction, how would one
best implement, in doc.tmac-u and an-old.tmac, - == \- == U+002D
for all devices?

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]