groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: .B, .I disable hyphenation?


From: G. Branden Robinson
Subject: Re: .B, .I disable hyphenation?
Date: Mon, 13 Sep 2021 03:27:50 +1000
User-agent: NeoMutt/20180716

Hi, Alex!

At 2021-09-12T14:56:39+0200, Alejandro Colomar (man-pages) wrote:
> Hi Branden,
> 
> Usually, when a manual page highlights a term, either in bold or
> italics, it usually is a special identifier (macro, function, command
> name or argument), for which hyphenation can hurt readability and even
> worse, turn it into a different valid identifier.
> 
> What about disabling hyphenation for .B and .I?
> Are there any inconveniences in doing so that I can't see?

The problem that arises is that the font styling macros are
presentational, not semantic, so it's hard to know whether someone is
using them for emphasis or to suggest syntactical information.  This is
why you made a statistical argument ("usually").

I'm hesitant to adopt your suggestion, because if we did make this
change, it would be difficult to override in the other direction, e.g.,
the page author is thinking, "yes, I'm putting emphasis here--hyphenate
the words as necessary, darn it!".  Because hyphenation had already been
suppressed, it would have to be manually added back in to every word in
the arguments to .B and .I.  Most writers of English, for good reason,
cannot be bothered to learn where the hyphenation points are and
understandably leave that chore to a computer.  (Moreover, U.S. and
Commonwealth English seem to apply different hyphenation rules.)

In my opinion it is easier, in terms of maintaining flexibility and
getting reliable results, to do what I do in the groff man page
corpus--disable hyphenation on a per-word basis when necessary.  To be
concrete, I populated the shell variable "MANS" with the man source
document files in the groff tree, and then performed this grep.

$ grep '^\.[BIR]\([BIR]\) \\%' $MANS

I got 434 matches in groff Git HEAD.  Here are 3 of them.

./src/utils/lkbib/lkbib.1.man:.IR \%@g@indxbib (@MAN1EXT@)
./src/utils/lkbib/lkbib.1.man:.IR \%@g@refer (@MAN1EXT@),
./src/utils/lkbib/lkbib.1.man:.IR \%@g@lookbib (@MAN1EXT@),

A whopping number of these are like the above: they are man page cross
references.  The `MR` macro I've been talking about for (over?) a year
now would render this usage of \% unnecessary, because MR would be
semantic and we know we _never_ want to hyphenate the name of a man
page[1].

The manual suppression of hyphenation is not necessary if you know a
word won't be hyphenated.  A trick that's been passed around on the
groff list is to have a shell one-liner handy that tells you all of the
automatic hyphenation points groff thinks a word has.

Here's the version of the "hyphen" script I use.

#!/bin/sh

for W
do
    printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' | tr -d '\n'
    echo
done

I don't have to remember or reason out which of "indxbib", "refer", or
"lookbib" will be hyphenated.  I can ask groff.

$ hyphen indxbib refer lookbib
in‐dxbib
re‐fer
look‐bib

Yup, they all need hyphens, so a leading \% is advised.  [Aside: What's
that "@g@" thing, you may ask?  Like the man page section number, it is
not groff syntax, but fodder for a sed script that replaces it during
make(1) with the command prefix configured by the person who builds
groff.  (When groff was first written in 1989-1991, it often had to
share a disk with a proprietary troff installation, and needed to stay
out of the latter's way.)  Since I can't know at man page maintenance
time what the builder will choose for a prefix, I have to assume that it
is something that is hyphenable, and so I suppress its hyphenation.]

By contrast, I don't need to suppress hyphenation for the following.
[Aside: These command names also don't collide with historical troff
names, so they don't need the command prefix, either.]

./src/utils/tfmtodit/tfmtodit.1.man:.IR groff (@MAN1EXT@),
./src/utils/tfmtodit/tfmtodit.1.man:.IR grodvi (@MAN1EXT@),

In my view, this is really not much work; I spend much more time
thinking about and recasting at the sentence or paragraph level--or
realizing there's some concept that we haven't explained adequately at
all and drafting a presentation of it...and, for that matter, composing
emails like this one--than I do worrying about hyphenation points.
Nevertheless, I recognize that many contributors of man pages to the
Linux man-pages project are _profoundly_ uninterested in typography--in
fact they may have hyphenation disabled altogether in their man page
renderer[3]--and regard every single thing they are required to learn
about *roff or man(7) syntax as one more nudge in the direction of
Markdown or some other alternative that they imagine will deliver them
to an effortless utopia where documentation practically writes itself.

I acknowledge that placement of these hyphenation control escapes looks
tedious (and it is, slightly).  If we want to fix this in the man(7)
macro language, then, in my opinion, the right way is to cross the
Rubicon and add more semantic macros.  I have never forwarded a serious
proposal along these lines because I still have full-thickness burns
over 90% of my body from exposure to DocBook 25 years ago.  The problem
that mortified me is that as soon as people get their hands on a
semantic tag they have, all too often, deployed it the highest
syntactical level of the implementation language they can locate.  In
HTML, for example, that is the element.

If we had a pair of macros that meant "my argument is a keyword" or "my
argument represents user-replaceable text", respectively, then we could
easily and reliably solve the problem at the level you're tempted to.
(Though as a matter of fact, I would _not_ disable hyphenation for
nonliterals...why should we?  They don't need to be copy and pasted
as-is--if they are, they get replaced anyway by definition--and
descriptive nonliterals are sometimes long[4], as anyone who's read a
few BNF grammars can attest.)

The smallest, tightest solution I have been managed to contemplate that
does not bloat the name space of man(7) is something along these likes.

.KW keyword [tag-space]
.VA metavar [tag-space]

Here is a straw-man example.

.KW strlen function
and
.KW strnlen
return
.KW size_t type
and take an argument
.VA s variable
that is expected to be a
.KW "const char *" type

...which would render as

strlen and strnlen return size_t and take an argument s that is expected
to be a const char *

"strlen", "strnlen", "size_t" and "const char *" would be styled as
directed elsewhere (with defaults in an.tmac or an-ext.tmac, but
possibly overridden in man.local to suite distributor or site tastes).

I wanted to end the example sentence with a period, but right away we
hit one of the problems that has prevented me from advancing this
proposal, which is the question of how to handle adjacent
punctuation...add yet another macro argument for it, or encourage usage
of the output line continuation escape \c, which historically terrifies
people?  Support multiple optional arguments, and force people to learn
to quote empty macro arguments, an inconvenience that man(7) largely
already spares them from if they practice good style[5]?

In case it needs to be pointed out, I think it's impractical for
man(7)--as a macro package--to prescribe descriptors for the "tag" name
space.  mdoc(7) somewhat notoriously maintains large catalogs of a
proliferating number of BSD-descended operating system names and
releases, a source of ongoing tedium and maintainability
frustrations[6].  DocBook's attempt to boil this ocean is what drove me
away from it and I don't want to bloat groff man(7) with something
that's going to demand community consensus--and, I expect, some amount
of heated debate--to resolve.  The virtues of _having_ a tag name space
are, I trust, well understood, and their availability is a point Ingo
takes some justified pride in with the support thereof in mandoc(1).

The Linux man-pages project is much better suited than the groff project
is to design and promulgate a set of canonical tags; to point out just
one blind spot, groff doesn't ship _any_ section 2 or 3 man pages,
whereas these sections are Linux man-pages' bread and butter (though the
long-neglected section 7 is looking better all the time and at last
fulfilling its decades-old potential).

I don't have answers to the questions I've raised, so in the meantime, I
practice the discipline of using the hyphenation control escape sequence
with the font style macros.

To conclude this epistle with some possible next steps to take, I
foresee a few possibilities.

1. Despair of popularizing this knowledge.  Encourage people to continue
   to do as they have always done, and trust more detail-oriented
   contributors like yourself to clean up .B and .I calls with
   hyphenation control escapes as required.
2. Teach people about correct usage of the \% escape in man-pages(7),
   and remind contributors about this subject about as often as you have
   to do regarding semantic newlines.
3. Lobby for a change to man(7) implementations as you originally
   suggested.  I know I've voiced some resistance to this idea, but your
   bigger challenge may be getting a hold of any maintainers of
   non-groff man(7) implementations to even field the proposal.  On the
   other hand, if groff and mandoc are all you care about, you've
   already reached the right people.  :)
4. Have Linux man-pages provide its own implementations of .B and .I to
   do what you want.  (Every Linux man-pages document could use the
   `.so` request to load such overrides.)  This might represent an
   irreconcilable conflict between your project's needs and groff, and
   I'm pretty sure no one wants to see that happen, but in the spirit of
   frankness I have to point out that this is a possibility, and one
   that may not have occurred to many Linux man-pages contributors.
5. Cross the Rubicon and develop semantic macros for man(7).  The payoff
   here is huge but the effort required will not be small.
   (Implementation is not the hard part; socializing the change and
   providing a smooth transition/deployment path for umpteen
   distributors who won't ship Linux man-pages releases in synchrony
   with any other particular thing will be much more challenging, I
   predict.  And that's not even counting the issue of standardizing a
   lexicon for the tag name space.)
6. [ObIngoSchwarze: Switch to mdoc(7).]

Regards,
Branden

[1] Erlang developers may disagree.[2]  :-|
[2] https://savannah.gnu.org/bugs/?43532
[3] Or would, if they knew it was possible.  See the `HY` register in
    the "Options" section of groff_man(7) or the `--nh` option of man-db
    man(1).
[4] Here's an example from groff_font(5) in groff Git HEAD.

       papersize format‐or‐dimension‐pair‐or‐file‐name
              Set the dimensions of the physical output medium according
              to  the argument, which is either a standard paper format,
              a pair of dimensions, or the name of  a  plain  text  file
              containing either of the foregoing.  Recognized paper for‐
              mats are the ISO and DIN formats A0–A7, B0–B7, C0–C7,  and
              D0–D7;  the  U.S.  formats letter, legal, tabloid, ledger,
              statement, and executive; and the envelope formats  com10,
              monarch, and DL.  Case is not significant for the argument
              if it holds predefined paper types.

              Alternatively, the argument can be a custom paper size  in
              the  format  length,width  (with no spaces before or after
              the comma).  Both length and width must have  a  unit  ap‐
              pended;  valid  units are “i” for inches, “c” for centime‐
              ters,  “p”  for  points,  and  “P”  for  picas.   Example:
              “12c,235p”.   An  argument that starts with a digit is al‐
              ways treated as a custom paper format.

              Finally, the argument can be a file name  (e.g.,  /etc/pa‐
              persize); if the file can be opened, troff reads the first
              line and attempts to match the above  forms.   No  comment
              syntax is supported.

              More  than one argument can be specified; troff scans from
              left to right and uses the first  valid  paper  specifica‐
              tion.

[5] https://man7.org/linux/man-pages/man7/groff_man_style.7.html#Notes
[6] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867123

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]