groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff 1.23.0.rc2 readiness


From: G. Branden Robinson
Subject: Re: groff 1.23.0.rc2 readiness
Date: Sat, 28 May 2022 12:54:41 -0500

Hi Ingo,

At 2022-05-28T18:49:22+0200, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Fri, May 27, 2022 at 05:50:51PM -0500:
> 
> > I want to prevail upon man page authors to choose correct glyphs in
> > their documents--so we can get a consistent experience on all output
> > devices.
> 
> Ouch.  Those special rules for manual pages allowing to write plain
> ASCII "-" on the input side to get plain ASCII "-" on the output
> side were *device specific* (same for the other four characters),

In groff, yes.  The special rules for - ` and ' (commas omitted for
clarity) were device-specific by being guarded by ".if '\*[.T]'utf8'" in
the man(7) and mdoc(7) macro files, and ^ and ~ were mapped to "big",
non-modifier glyphs in the font description file for the 'utf8' output
device (so they affected all groff output for that device, not just that
for man pages).

> which means that this text i wrote four years ago in mandoc_char(7)
> was never accurate - right?
> 
>   In roff(7) documents, the minus sign is normally written as `\-'.

Well, that part's true.

>   In manual pages, some style guides recommend to also use `\-'
>   if an ASCII 0x2d "hyphen-minus" output glyph that can be copied
>   and pasted is desired in output modes supporting it, for example
>   in -T utf8 and -T html.

This is also true.  It is what style guides recommend, and sound advice.

>   But currently, no practically relevant manual page formatter
>   requires that subtlety, so in manual pages, it is sufficient to
>   write plain `-' to represent hyphen, minus, and hyphen-minus.

This _might_ be false--confining your domain to the "practically
relevant" allows you to draw your boundary wherever you wish, at the
risk of objection and controversy.  ;-)

> When considering PDF and HTML output, that was anpparently never true
> (except with mandoc(1)),

Here's groff 1.22.4 with these output devices (i.e., well before any of
my changes).

$ printf 'long-term 1\\-1\n' | groff -T html | grep term
<p>long-term 1&minus;1</p>

$ printf 'long-term 1\\-1\n' | groff -T pdf -Z | grep -A5 term
tlong-term
wh2500
t1
C\-
h5640
t1

For those for whom "grout" output[1] is inscrutable, this means that
that a hyphen was placed on the output for "long-term" and a minus sign
in the midst of "1-1".

Yes, even though that's an ASCII hyphen-minus in the device-independent
output.  Historically, all *roffs of which I am aware map ASCII 0x2d/45
to the hyphen glyph if the output device and font supports one.  Most
8-bit encodings have only candidate (Windows-1252 is an exception).

$ grep -A1 hyphen /usr/share/groff/1.22.4/font/devpdf/TR
-       333,257 0       45      hyphen
hy      "

> plain "-" on the input side always gave you a hyphen in PDF output
> even in manual pages, right?

Yes, this is correct to the best of my knowledge.

> Now *making* that device independent would not be an improvement
> because then manual page authors would have to write
> 
>   The --foo option can\(cqt be used together
>   with a non\(hyzero bar argument.

If by "that", you mean forcing '-' to map to some glyph other than the
hyphen, yes, you're correct.

\(cq is a separate issue.  People should just use ' for a prose
apostrophe.  It may end up being a homoglyph of \(cq, but that's the
output device and font's business.

We need \(aq only for documentation of programming languages that have
attached semantics specifically to Unicode Basic Latin code points that
are confusable with other glyphs.  However, the problem is not
particularly painful, as \(aq is a legitimate special character with the
same meaning (if available, which is probably will always be since the
glyph has been part of ISO 8859 since the 1980s and Unicode since 1991).

> which is clearly not easier than having to write
> 
>   The \-\-foo option can't be used together
>   with a non-zero bar argument.

Agreed.

> Consequently, using these five glyphs unescaped on the input side in
> order to get the five ASCII output characters was always wrong even in
> manual pages

That is my view.

> (not sure how i managed to fail to realize that until now, or more
> likely forget it again).

Maybe I could have been a more effective exponent of my reform earlier.

> I think the argument "this was always wrong and only worked on some
> output devices at best" is stronger than my argument "but manual page
> authors have become used to it and some style guides including my own
> actively recommended it".  So i think i have to retract my objection.
> Yes, it causes a lot of work including in OpenBSD, but when something
> is actually broken, calling that a "make-work project" is not really
> fair.

I'm happy to assist with the crafting of sed scripts to facilitate.  The
amount of work won't be trivial, I fear; when trying simply to grep the
man pages on my system to get a feel for the problem's proportions, I
kept hitting false positives.

It might be necessary to bust out heavier machinery, like PCRE's
zero-width lookahead or lookbehind assertions, to calibrate matches'
sensitivity and validity.  Fortunately, Perl can be used to rewrite
files in place as sed can.

> Note that i can only retract my part of the argument; John might still
> have points that are not invalidated by this mistake i made.

Certainly.

Regards,
Branden

[1] "grout" is my shorthand for "device-independent output produced by
    GNU troff".  I have _not_ incorporated this whimsical term into any
    groff documentation.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]