groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff 1.23.0.rc2 readiness


From: G. Branden Robinson
Subject: Re: groff 1.23.0.rc2 readiness
Date: Fri, 27 May 2022 17:50:51 -0500

Hi Johm,

At 2022-05-27T11:04:52+1000, John Gardner wrote:
> > I have no problem adding an item to the PROBLEMS file with a chunk
> > of groff source that people can put in their site "man.local" or
> > "troffrc" files to achieve the ASCII-degradation of the five glyphs
> > that novice man page writers abuse so copiously.
> 
> Can we *please* be practical about this?

I'm trying to be.

Incidentally there is a bit of a muddle here as your original point in
the bug report seems to be solely about ~ and ^, whereas Ingo's
secondment sweeps up the other ASCII characters without identity
mappings as well.

> 90% of Groff users, if not more, are only doing so via man(1) to read
> man pages.

Yes.  And probably 95% or more of groff users are doing so via a
package of some sort prepared by a distribution vendor like Debian
GNU/Linux, OpenBSD, Fedora, or some other intermediary between
"upstream" (us) and themselves.

That is why I said "If every *nix vendor in the world seizes upon the
above and adds it, I can view it with equanimity."[1]

> Many of whom are probably oblivious to the existence of a typesetting
> system underneath that's powering it all. They won't care about local
> configuration, they'll just be annoyed that there's another bunch of
> annoying characters they need to replace in anything copy+pasted from
> a terminal. Think Stack Overflow posts containing ˆ and ˜ by hapless
> users unaware that a regex or path they just copied contain what're
> essentially diacritics without a character.

True; people will attempt copy and paste from PDF files as well.  That's
why I want to prevail upon man page authors to choose correct glyphs in
their documents--so we can get a consistent experience on all output
devices.  I discussed this with Michael Kerrisk, the co-maintainer of
the Linux man-pages project (Alejandro's counterpart) almost a year and
a half ago[2].  He's been doing that job a long time and was not
alarmed.

> Which reminds me: *these characters were designed to be overstruck*. A
> + ˆ = Â, A + ˜ = Ã.

In ASCII?  Yes, except for the hyphen, originally they were--if they
weren't replaced by some national character set's alternative glyphs.
This incidentally includes the neutral double quote ("), which is why it
looks so funny on Teletype Model 37 output (attached).

When the C/A/T showed up at the Murray Hill Unix Room, some of these
input characters were given (potentially) overstrikable semantics.  The
text ("standard") fonts had both a hyphen glyph and a minus glyph, so as
I say in groff_char(7), a decision had to be taken which one got mapped
to plain '-' and which one was going to need an escape sequence.
Similarly, ` and ' became entrenched as directional single quotes, and
their backslash-prefixed forms became accent marks.  The C/A/T's
standard fonts didn't have distinct high-flown ^ and ~ glyphs.  They
appeared only in the AT&T-specified "special font", where, as far as my
eyes can tell,  they are drawn entirely above the cap-height of the
standard fonts.

See the image (from the 1976 edition of CSTR #54) attached to comment #3
of <https://savannah.gnu.org/bugs/?42473>.

EMCA-6 (ISO 646) muddied the waters a little bit.  But since both ^ and
~ were replaceable code points, I suppose people didn't kick up too much
of a fuss.

Unicode 1.0 (October 1991) further stirred the mud; "ASCII" ^ was
recognized as a high, small glyph that certainly _looks_ overstrikable,
and ASCII ~ was permitted to be overstrikable or not!  See attachment.

Unicdoe 2.0 (July 1996) finally got off the pot and decided upon "big",
spacing semantics for (what was now termed) Basic Latin ^ and ~.  See
attachment.  It would be another four years before Unicode really
started to penetrate to *nix terminal environments, with support
arriving thanks in no small measure to the efforts of Markus Kuhn.[3]

With conflicting and unstable traditions, it is no wonder that there is
confusion around this issue.  groff has _mostly_ been consistent
throughout its history as to the semantics of these characters.  An
exception is that in January 2009, groff's man(7) and mdoc(7) were
patched to map all of -, \-, ', and ` to Basic Latin code points.

https://git.savannah.gnu.org/cgit/groff.git/commit/?id=98acc924f4e32cfc2209df5db0c21921df8cc7ac

If I had been around at the time to utter ominous warnings much as you
are, I'd have beseeched Werner to put the above code into troffrc (with
some kind of guard like '.if d TH') or man.local and mdoc.local and put
a comment above it saying that it should be removed by people who wanted
to undertake fixing the many wrong extant man pages, who didn't mind
those pages' misrendering, or whose systems' man pages had been
corrected in some tolerable proportion.

In my view it was a stopgap measure that should have been advertised as
such.  (With the exception of \- going to \N'45', because we simply
_don't have_ in *roff an input character--ordinary or special--that
means "the hyphen-minus, yes, THAT one, the root of all misery".)

> In a PDF or PostScript document, or with a hardware teletype, this
> sort of composition is easy. In a modern terminal environment, not so
> much.  They're not making typesetting any better, they're only making
> user experience worse.

I don't think this is squarely on point.  It's not particularly hard to
type "\[a aa]" or "\['a]", let alone the more portable "\('a".  There
are some of the *roff-esque ways to achieve character composition
(others are discussed in groff_char(7)).

a^H' was a good way to get an a-with-acute-accent on a Model 37 but
people generally don't compose characters that way anymore.  Dead keys
(common on European keyboards), 3- and 4-level keyboard layouts, and
"input methods" are all more common.

> Now, we can deplore the state of man page authorship as much as we
> like, but the truth is that most software authors won't see this as a
> problem on their end,

To the extent that's true, man pages will continue to suck.  As long as
man page authorship is conducted by people who refuse to read or learn,
their documentary output will tend to be of poor quality, because such a
mindset is a severe hindrance to excellent technical writing.  However,
my hope is that such people are a minority, even if a noisy one.

Even so, we can acknowledge that the *roff language's syntax is, in
Kernighan's term, "rebarbative" (CSTR #97, I think).  That is why I feel
it is fair to document transition mechanisms like the one I've pushed
today, why I have striven to document these matters as thoroughly and
conscientiously as I can, and why I am willing to undertake, as I said
in the message to which you replied, the preparation of patches for
automated generators of man(7) output that may be unmaintained and/or
whose maintainers are unreceptive to changes.  Some such people may
indeed view this as the last straw, flip man(7) the bird, and decamp for
Markdown, which always just Does What You Mean (right?[5]).

> or with end user configuration. They'll see this as a regression
> in the latest version of Groff and will file bug reports accordingly.

I'm prepared for that, but so too should our distributors be, so I've
added a 'NEWS' item and updated the existing 'PROBLEMS' item (which
dates back to July 2003).

https://git.savannah.gnu.org/cgit/groff.git/commit/?id=915a878038236769eb072f728389352c1da88719

> If you still decide to go ahead: Don't say I didn't warn you.

I'm warned.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2022-05/msg00052.html
[2] 
https://lore.kernel.org/all/a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com/T/#m8282cb95b86db994508ece3165340e0075c3871d
[3] https://www.cl.cam.ac.uk/~mgk25/unicode.html
[4] https://cygwin.com/pipermail/cygwin/2002-October/085349.html is an
    example that will live in infamy.
[5] 
https://docs.racket-lang.org/pollen/second-tutorial.html#%28part._the-case-against-markdown%29

Attachment: Unix_V2_ascii_vii_man_page.png
Description: Unix_V2_ascii_vii_man_page.png

Attachment: Unicode_1.0_ASCII.png
Description: Unicode_1.0_ASCII.png

Attachment: Unicode_2.0_Basic_Latin.png
Description: Unicode_2.0_Basic_Latin.png

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]