groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff 1.23.0.rc2 readiness


From: G. Branden Robinson
Subject: Re: groff 1.23.0.rc2 readiness
Date: Sun, 29 May 2022 07:42:09 -0500

Hi John,

At 2022-05-29T20:33:49+1000, John Gardner wrote:
[I wrote:]
> > Incidentally there is a bit of a muddle here as your original point
> > in the bug report seems to be solely about ~ and ^, whereas Ingo's
> > secondment sweeps up the other ASCII characters without identity
> > mappings as well.
> 
> I'm specifically referring to ~ and ^. Though I agree with Ingo's
> sentiments concerning hyphens and directional single-quotes, I
> consider those to be in the *"too late to fix"* basket.

Okay.

[rearranging your message slightly]
> Unlike quotes and dashes, [^ and ~] aren't fundamental elements of
> English orthography.

That is certainly the case.  They are so unimportant to traditional
typography that Graphic Systems Inc. had _no_ coverage for them in the
Times fonts they sold to AT&T in ca. 1973.  It is largely ASCII (in its
1968 revision) that we can thank for thrusting these sigils into our
everyday experience.

> Admittedly, I don't understand why ^ and ~ are deserving of special
> typesetting treatment.  I find the wrangling of ^ and ~ to be equally
> jarring in PDF output as well; if I were to solicit a change to
> Groff's behaviour, it would be suppress the mangling of ^ and ~,
> forcing users to request a modifier character specifically if they
> desire one.

Here is all the rationale I know of for ^ and ~ getting special-cased
for groff's "utf8" output device, albeit that special-casing led to the
outcome you desire.

https://git.savannah.gnu.org/cgit/groff.git/commit/?id=e092fba45175220aeee4912da9e2b96228a798b3

(scroll down to the "font/devutf8/NOTES" changes)

  "devps" maps \(a~ and ~ to asciitilde, which is equivalent to 0x02DC, but
  this glyph is just too small. We map them to 0x007E instead.

  "devps" maps \(a^ and ^ to circumflex, which is equivalent to 0x02C6, but
  this glyph is just too small. We map them to 0x005E instead.

As you can see, the decision taken here was deliberately to deviate from
the behavior of devps.  Or, given that, in *roff*, glyphs can be
overstricken arbitrarily, we could even say that the glyphs' _semantics_
were changed, if we infer composability from glyph size and positioning.

And devps behaved the way it did because of the way those glyphs were
represented in the "special font" AT&T commissioned from Graphic
Systems.  I point again to the exhibit in comment #3 of Savannah #42473.

https://savannah.gnu.org/bugs/?42473

I put devutf8 back into alignment with devps/devpdf with the following
commit.

commit a246806d81351996ac2fa4a8f0826915e576bc0f
Author:     G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Sat Jan 15 12:13:45 2022 +1100

    Simplify Unicode character mapping process.

    * tmac/unicode.tmac: Drop.  It was originally added in 2005 to suppress
      horizontal spacing of glyphs in the range U+0483..9.  Its purpose has
      wandered over the years; most recently to map the Basic Latin
      ("ASCII") hyphen-minus, apostrophe, and grave accent to special
      characters (and thus ultimately to the General Punctuation block).
      But this is unnecessary since the font descriptions for devices with
      the `unicode` property can provide this information, and anyone who
      wants to alter the mappings can change either font description files,
      output device macro files, or troffrc; or add `char` requests to their
      macro packages or documents (in decreasing magnitude of ambition).

    * tmac/html.tmac:
    * tmac/tty.tmac: Stop sourcing unicode.tmac.

    * tmac/tmac.am (TMACNORMALFILES): Stop shipping it.

    * font/devutf8/NOTES: Drop remarks about mapping of \[a~], \[a^], and
      Basic Latin circumflex accent and tilde.  Not only do I disagree with
      the reasoning (whether these glyphs are "too small" depends on the
      font used by the terminal emulator, over which we have no control),
      but this mapping happens in a completely different part of the source
      tree, src/libs/libgroff/glyphuni.cpp.

    * font/devhtml/R.proto:
    * font/devutf8/R.proto: Add mappings for the five Basic Latin characters
      that map surprisingly (see groff_char(7)) and are not syntactically
      significant to troff.  Three of these are ported from unicode.tmac.
      (html): Don't migrate the hyphen-minus--yet.

(I postponed grohtml changes not for any principled reason, but because
it is less used and somewhat frustrating to deal with as a developer.)

> Who's to say every intermediary will share the same opinion about
> tampering with man.local? Homebrew <https://brew.sh/>, for example,
> has a strict policy about patching
> <https://docs.brew.sh/Formula-Cookbook#patches> software, meaning
> there's zero chance of your suggested amendment reaching macOS users.

In that case we might see more pressure for corrections to man pages
coming from the macOS Homebrew community.  I don't think that's an
entirely bad thing.

> > I don't think man pages should have to be written one way for
> > terminals and another for PDF
> 
> I wholeheartedly agree, which is why I believe we should abolish the
> hell out of Groff's “special” treatment of ^ and ~. They don't appear
> frequently enough in Latin-based writing systems to justify an
> exception to Groff's character handling rules (whereas dashes and
> directional quotes do)

As I was at pains to explain above, my change here was actually to
_recover_ continuity with AT&T troff.  You are proposing a more
disruptive change than the one I actually made.

Can you pursue this more radical proposal in a dedicated thread?

> > "grout" is my shorthand for "device-independent output produced by
> > GNU troff"
> 
> I've given in and taken to calling it "ditroff" informally, even
> though I know damn well that it's a misappropriation.

Yes.  Also, Kernighan dislikes the term,[1] and there _are_ differences
between AT&T ditroff (output) and GNU troff output, since the latter has
extensions--the most disruptive one being for multi-line device control
commands.  That's why I lean toward giving it a new name: "grout".

Incidentally I forgot to anchor one of my footnotes earlier.  I'll take
this opportunity to correct that.

> > > Now, we can deplore the state of man page authorship as much as we
> > > like, but the truth is that most software authors won't see this
> > > as a problem on their end,
> >
> > To the extent that's true, man pages will continue to suck.  As long
> > as man page authorship is conducted by people who refuse to read or
> > learn, their documentary output will tend to be of poor quality,
> > because such a mindset is a severe hindrance to excellent technical
> > writing.  However, my hope is that such people are a minority, even
> > if a noisy one.[4]
> >
> > [4] https://cygwin.com/pipermail/cygwin/2002-October/085349.html is
> >     an example that will live in infamy.

Regards,
Branden

[1] "I'm pretty sure that I only talked about a "device independent
    troff"; the name "ditroff" came from somewhere else, and I've never
    been fond of it."
    https://manpages.bsd.lv/history/kernighan_23_10_2011.txt

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]