groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff 1.23.0.rc2 readiness


From: John Gardner
Subject: Re: groff 1.23.0.rc2 readiness
Date: Sun, 29 May 2022 20:33:49 +1000

Branden,


> Incidentally there is a bit of a muddle here as your original point in the
> bug report seems to be solely about ~ and ^, whereas Ingo's secondment
> sweeps up the other ASCII characters without identity mappings as well.
>

I'm specifically referring to ~ and ^. Though I agree with Ingo's
sentiments concerning hyphens and directional single-quotes, I consider
those to be in the *"too late to fix"* basket.

Admittedly, I don't understand why ^ and ~ are deserving of special
typesetting treatment. Unlike quotes and dashes, they aren't fundamental
elements of English orthography. I find the wrangling of ^ and ~ to be
equally jarring in PDF output as well; if I were to solicit a change to
Groff's behaviour, it would be suppress the mangling of ^ and ~, forcing
users to request a modifier character specifically if they desire one.

And probably 95% or more of groff users are doing so via a package of some
> sort prepared by a distribution vendor like Debian GNU/Linux, OpenBSD,
> Fedora, or some other intermediary between "upstream" (us) and themselves.
>
> That is why I said "If every *nix vendor in the world seizes upon the
> above and adds it, I can view it with equanimity."


Who's to say every intermediary will share the same opinion about tampering
with man.local? Homebrew <https://brew.sh/>, for example, has a strict
policy about patching <https://docs.brew.sh/Formula-Cookbook#patches>
software, meaning there's zero chance of your suggested amendment reaching
macOS users.

I don't think man pages should have to be written one way for terminals and
> another for PDF
>

I wholeheartedly agree, which is why I believe we should abolish the hell
out of Groff's “special” treatment of ^ and ~. They don't appear frequently
enough in Latin-based writing systems to justify an exception to Groff's
character handling rules (whereas dashes and directional quotes do)

"grout" is my shorthand for "device-independent output produced by GNU
> troff"
>

I've given in and taken to calling it "ditroff" informally, even though I
know damn well that it's a misappropriation.

> .
>

On Sat, 28 May 2022 at 08:51, G. Branden Robinson <
g.branden.robinson@gmail.com> wrote:

> Hi Johm,
>
> At 2022-05-27T11:04:52+1000, John Gardner wrote:
> > > I have no problem adding an item to the PROBLEMS file with a chunk
> > > of groff source that people can put in their site "man.local" or
> > > "troffrc" files to achieve the ASCII-degradation of the five glyphs
> > > that novice man page writers abuse so copiously.
> >
> > Can we *please* be practical about this?
>
> I'm trying to be.
>
> Incidentally there is a bit of a muddle here as your original point in
> the bug report seems to be solely about ~ and ^, whereas Ingo's
> secondment sweeps up the other ASCII characters without identity
> mappings as well.
>
> > 90% of Groff users, if not more, are only doing so via man(1) to read
> > man pages.
>
> Yes.  And probably 95% or more of groff users are doing so via a
> package of some sort prepared by a distribution vendor like Debian
> GNU/Linux, OpenBSD, Fedora, or some other intermediary between
> "upstream" (us) and themselves.
>
> That is why I said "If every *nix vendor in the world seizes upon the
> above and adds it, I can view it with equanimity."[1]
>
> > Many of whom are probably oblivious to the existence of a typesetting
> > system underneath that's powering it all. They won't care about local
> > configuration, they'll just be annoyed that there's another bunch of
> > annoying characters they need to replace in anything copy+pasted from
> > a terminal. Think Stack Overflow posts containing ˆ and ˜ by hapless
> > users unaware that a regex or path they just copied contain what're
> > essentially diacritics without a character.
>
> True; people will attempt copy and paste from PDF files as well.  That's
> why I want to prevail upon man page authors to choose correct glyphs in
> their documents--so we can get a consistent experience on all output
> devices.  I discussed this with Michael Kerrisk, the co-maintainer of
> the Linux man-pages project (Alejandro's counterpart) almost a year and
> a half ago[2].  He's been doing that job a long time and was not
> alarmed.
>
> > Which reminds me: *these characters were designed to be overstruck*. A
> > + ˆ = Â, A + ˜ = Ã.
>
> In ASCII?  Yes, except for the hyphen, originally they were--if they
> weren't replaced by some national character set's alternative glyphs.
> This incidentally includes the neutral double quote ("), which is why it
> looks so funny on Teletype Model 37 output (attached).
>
> When the C/A/T showed up at the Murray Hill Unix Room, some of these
> input characters were given (potentially) overstrikable semantics.  The
> text ("standard") fonts had both a hyphen glyph and a minus glyph, so as
> I say in groff_char(7), a decision had to be taken which one got mapped
> to plain '-' and which one was going to need an escape sequence.
> Similarly, ` and ' became entrenched as directional single quotes, and
> their backslash-prefixed forms became accent marks.  The C/A/T's
> standard fonts didn't have distinct high-flown ^ and ~ glyphs.  They
> appeared only in the AT&T-specified "special font", where, as far as my
> eyes can tell,  they are drawn entirely above the cap-height of the
> standard fonts.
>
> See the image (from the 1976 edition of CSTR #54) attached to comment #3
> of <https://savannah.gnu.org/bugs/?42473>.
>
> EMCA-6 (ISO 646) muddied the waters a little bit.  But since both ^ and
> ~ were replaceable code points, I suppose people didn't kick up too much
> of a fuss.
>
> Unicode 1.0 (October 1991) further stirred the mud; "ASCII" ^ was
> recognized as a high, small glyph that certainly _looks_ overstrikable,
> and ASCII ~ was permitted to be overstrikable or not!  See attachment.
>
> Unicdoe 2.0 (July 1996) finally got off the pot and decided upon "big",
> spacing semantics for (what was now termed) Basic Latin ^ and ~.  See
> attachment.  It would be another four years before Unicode really
> started to penetrate to *nix terminal environments, with support
> arriving thanks in no small measure to the efforts of Markus Kuhn.[3]
>
> With conflicting and unstable traditions, it is no wonder that there is
> confusion around this issue.  groff has _mostly_ been consistent
> throughout its history as to the semantics of these characters.  An
> exception is that in January 2009, groff's man(7) and mdoc(7) were
> patched to map all of -, \-, ', and ` to Basic Latin code points.
>
>
> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=98acc924f4e32cfc2209df5db0c21921df8cc7ac
>
> If I had been around at the time to utter ominous warnings much as you
> are, I'd have beseeched Werner to put the above code into troffrc (with
> some kind of guard like '.if d TH') or man.local and mdoc.local and put
> a comment above it saying that it should be removed by people who wanted
> to undertake fixing the many wrong extant man pages, who didn't mind
> those pages' misrendering, or whose systems' man pages had been
> corrected in some tolerable proportion.
>
> In my view it was a stopgap measure that should have been advertised as
> such.  (With the exception of \- going to \N'45', because we simply
> _don't have_ in *roff an input character--ordinary or special--that
> means "the hyphen-minus, yes, THAT one, the root of all misery".)
>
> > In a PDF or PostScript document, or with a hardware teletype, this
> > sort of composition is easy. In a modern terminal environment, not so
> > much.  They're not making typesetting any better, they're only making
> > user experience worse.
>
> I don't think this is squarely on point.  It's not particularly hard to
> type "\[a aa]" or "\['a]", let alone the more portable "\('a".  There
> are some of the *roff-esque ways to achieve character composition
> (others are discussed in groff_char(7)).
>
> a^H' was a good way to get an a-with-acute-accent on a Model 37 but
> people generally don't compose characters that way anymore.  Dead keys
> (common on European keyboards), 3- and 4-level keyboard layouts, and
> "input methods" are all more common.
>
> > Now, we can deplore the state of man page authorship as much as we
> > like, but the truth is that most software authors won't see this as a
> > problem on their end,
>
> To the extent that's true, man pages will continue to suck.  As long as
> man page authorship is conducted by people who refuse to read or learn,
> their documentary output will tend to be of poor quality, because such a
> mindset is a severe hindrance to excellent technical writing.  However,
> my hope is that such people are a minority, even if a noisy one.
>
> Even so, we can acknowledge that the *roff language's syntax is, in
> Kernighan's term, "rebarbative" (CSTR #97, I think).  That is why I feel
> it is fair to document transition mechanisms like the one I've pushed
> today, why I have striven to document these matters as thoroughly and
> conscientiously as I can, and why I am willing to undertake, as I said
> in the message to which you replied, the preparation of patches for
> automated generators of man(7) output that may be unmaintained and/or
> whose maintainers are unreceptive to changes.  Some such people may
> indeed view this as the last straw, flip man(7) the bird, and decamp for
> Markdown, which always just Does What You Mean (right?[5]).
>
> > or with end user configuration. They'll see this as a regression
> > in the latest version of Groff and will file bug reports accordingly.
>
> I'm prepared for that, but so too should our distributors be, so I've
> added a 'NEWS' item and updated the existing 'PROBLEMS' item (which
> dates back to July 2003).
>
>
> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=915a878038236769eb072f728389352c1da88719
>
> > If you still decide to go ahead: Don't say I didn't warn you.
>
> I'm warned.
>
> Regards,
> Branden
>
> [1] https://lists.gnu.org/archive/html/groff/2022-05/msg00052.html
> [2]
> https://lore.kernel.org/all/a1af3f5c-f3e9-4bf3-cad5-389571c45d27@gmail.com/T/#m8282cb95b86db994508ece3165340e0075c3871d
> [3] https://www.cl.cam.ac.uk/~mgk25/unicode.html
> [4] https://cygwin.com/pipermail/cygwin/2002-October/085349.html is an
>     example that will live in infamy.
> [5]
> https://docs.racket-lang.org/pollen/second-tutorial.html#%28part._the-case-against-markdown%29
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]