groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

groff man(7) `B` macro behavior with `\c`, and input traps


From: G. Branden Robinson
Subject: groff man(7) `B` macro behavior with `\c`, and input traps
Date: Sun, 5 Jun 2022 18:08:46 -0500

[Warning: this message is long.]

Hi Ingo,

At 2022-06-05T19:09:53+0200, Ingo Schwarze wrote:
> On a tangent, i just noticed that
> 
>   .TH TEST 1
>   .B foo\c
>   bar
> 
> prints "bar" in Roman font with mandoc(1), which seems correct to me,
> whereas it appears to print "bar" in bold with groff-current
> compiled from git, which seems strange to me, just as if i had
> written
> 
>   .TH TEST 1
>   .B foo\c
>   .B bar
> 
> Which one is correct output, Roman bar or bold bar?

I believe a bold "bar" is correct.  It is also consistent with Heirloom
Doctools nroff, which I last built from its Git HEAD on 2020-02-25.
(This is still fairly close to HEAD today.)

I tried Unix V7 nroff, but I couldn't get it to produce any form of
boldface or italics with macro calls or escape sequences, the latter
even in a raw *roff document.  (And I did check the output with "od -c"
to ensure it wasn't just my terminal emulator playing me up.)  So
which behavior is more faithful to the Elders is an open question.

Here is groff man(7)'s definition of the `B` macro from Git HEAD.  groff
1.22.4's is not materially different.

.\" Set arguments (or next input line if none) in bold style.
.de1 B
.  itc 1 an-input-trap
.  ft B
.  if \\n[.$] \&\\$*
..

As to _why_ it works this way, here is an attempted explanation and
defense thereof.

The original man(7) (1979) had an idiom of macros optionally taking zero
arguments, in which case they applied their effect to the next line.
Examples include `B`, `I`, `SH`, `SS`, `SM` and `TP`.

Man pages have exhibited a strong tendency to desert that idiom, except
in the case of `TP`, in favor of supplying arguments to these macros,
which then terminate their effect.

So, while a V7 man page author might well have written

.SH
Laboratory cuisine
Try my
.B
steaming
bowl of
.I
porridge.

...page writers since the '90s, and maybe even the '80s, seem to have
settled on a different style.

.SH "Laboratory cuisine"
Try my
.B steaming
bowl of
.I porridge.

My supposition is that this idiom came to dominate because it is more
frugal with screen space on glass TTYs.  (Recall that the Teletype Model
37 was a paper terminal that printed onto a long spool of paper.  I
assume that a Teletype operator feeling the need to "scroll back" would
simply reach over the platen and pull up as much of the ejected spool as
desired to refresh the memory.  I surmise that this also explains why
AT&T Unix was slow to develop its own pager program; more(1) was, I
believe, a BSD innovation well suited to the "glass TTYs" in use there.)

Also helping to drive this migration was, I suspect, the appearance of
groff circa 1990, it discarded the conventional limitations of 6
arguments to man(7) macros and 9 arguments to _any_ *roff macro.  You
thus no longer needed the zero-argument form for generality and the
avoidance of word-counting.

The way this "next-line context" worked was by setting an "input trap"
with the `it` request.  An input trap is commonly defined as a trap that
springs after a certain number of "input lines"*, but that description
needs a honking big asterisk on it.  I've very recently rewritten our
description of this request in groff's documentation because what we
had, including my own previous attempt at clarification, was not quite
correct.

For man(7)'s purposes, the "certain number of lines" is always 1.  The
discussion here can be generalized to larger values.

The examples above are straightforward.  They become significantly more
interesting when the line after an input trap is planted (or a call of
macro that plants one) is _not a text line_.  That is, when it's a
control line--when it begins with a control character.

Consider this example.

.B
.
foo

Should "foo" be in bold?  If we strictly count "input lines", then the
answer is a thunderingly obvious "no".  A line with the empty request is
an input line, no question.

However, groff and Heirloom Doctools nroff both put "foo" in bold.  Why?

Input traps don't count lines that don't directly produce formatted
output.

The empty request obviously doesn't.  These semantics turn out to be
highly convenient for macro programming; you can set your input trap in
your macro, do other housekeeping, then return.  You can revise your
macro definition without having to go back and re-count lines between
the planting of the trap and the end of the macro (as long as they don't
produce formatted output), tediously updating the input trap line count.
Since much of macro programming is string and register housekeeping
(defining, incrementing, updating, and, if you're fastidious, removing),
and these things don't (of themselves) produce formatted output, you can
conduct such activities ad libitum.  Even better, you don't have to
count the line that ends the macro definition (often ".."), which you
might be likely to forget.

So, what kind of line produces formatted output?  Well, text lines.  But
conditional control lines can, too, as with ".if 1 blahblahblah".  But,
in general, they don't.  That includes control lines that are macro
calls, because macro calls can produce output, even conditional output!
(CSTR #54 says simply "control or request lines do not count".)

An interesting consequence is that such calls of input line
trap-planting macros could be chained.  What if you wanted to print a
word in bolface but also smaller than the normal type size (but didn't
want to use escape sequences)?

You could do this.

Today,
.B
.SM
CINCOMPAC
confirmed the ceasefire.

You could equivalently write this.

Today,
.B
.SM CINCOMPAC
confirmed the ceasefire.

If the original man(7) had also included a word-boxing macro `BX` that
worked similarly, there's no reason to suppose the following wouldn't
have been valid, and productive of the expected output.

Today,
.B
.SM
.BX
CINCOMPAC
confirmed the ceasefire.

Unfortunately, I think this insight was lost on the folks at
(apparently) Sun Microsystems, because, as far as my research has been
able to uncover, we can locate the origin of the man(7) `SB` macro with
them.  Evidently they wanted small boldface so often that they made a
macro for it, and used it thousands of times in their man page
corpus.[1]

.\" SunOS 4
Today,
.SB CINCOMPAC
confirmed the ceasefire.

Earth-shaking innovation, huh?

As noted above, most of the foregoing uses have become vanishingly rare.

The important exception is the `TP` macro, which, uniquely among the
macros we're discussing, accepts an argument that is not to be formatted
as text.  Its argument is the desired indentation amount for the body of
the tagged paragraph to be set, and the next-line "argument" is the tag
itself.  (It's not really an argument, but normal input that is
intercepted by an input trap.)

There remains the question of what to do if some clever wag sticks a
`\c` escape sequence on the end of a text line (or macro argument)
that gets processed by an input trap instead of normally.

There is a fairly routine demand for such a thing; it came up on this
mailing list 4 or 5 years ago.  Here it is.

.TP
\-f
This option frobnicates the boojum,
unlike
.BR \-g .

A dutiful man page author might look and this and say, "hey, my options
are literals, so they're supposed to be in bold!  Why can't I have my
option flag in bold right where I'm introducing it?"  The author then
pulls a Philip J. Fry squint and says, "Hmm, maybe I can..."  They then
do this.

.TP
.B \-f
This option frobnicates the boojum,
unlike
.BR \-g .

And they're happy.

But then feeping creaturitis sets in and at some point, our helpful
utility needs this flag to take an optional argument.

.TP
.BI \-f [=cond]
This option frobnicates the boojum,
under the condition
.I cond,
which may be
.B florp
or
.BR snoggle .

The dutiful man page author reappears and wonders why the brackets and
equals sign are in italics.  They shouldn't be.  The equals sign is a
literal and the brackets should be neither.

https://www.youtube.com/watch?v=nx5GwULPU90

The Dragnet (1967) theme plays and we realize we have the three-font
problem.  At this point the rough-riding fans of unreadable man(7)
source storm to the rescue in a hail of backslashes.  I will not even
depict their solution because it's just too horrible.

But the more semantically-minded writers decide they're not beaten yet.
They know about output line continuation, and while they're not averse
to firearms^Wescape characters, they prefer to apply them minimally and
selectively.

.TP
.BR \-f [\c
.BI = cond\c
]
This option frobnicates the boojum,
under the condition
.I cond,
which may be
.B florp
or
.BR snoggle .

As I recollect, the foregoing would not work on groff 1.22.3 because its
man(7) implementation used `it` to set input traps in the macros we're
discussing, when what was needed was the `itc` request.[2]

In doing the research and experiments for this mail I found out to my
surprise that groff's `itc` doesn't precisely work as documented, or
rather, it does, but doesn't distinguish its operation from `it`.  A
nifty fact that is that, for the original example,

>   .TH TEST 1
>   .B foo\c
>   bar

both groff and Heirloom Doctools honor the semantics of `\c` and carry
its effect over to the subsequent line.  `itc` is not required for this
alone.

Where groff's `itc` differs is that the bit of state that marks "was the
previous input line continued with `\c`" _is tracked in the environment_
as opposed to discarded.  That means the bit of state can be copied from
one environment to another.  This is important for `TP` because it
formats the paragraph tag in a separate environment.  In my opinion,
this should have been part of the original semantics of `it`, because if
your new environment didn't want it, you could clear it with a `br`
request.[3][4]

And that's the story of why groff and Heirloom Doctools nroff put "bar"
in bold.

Regards,
Branden

[1] I wrote the following source comment in our groff_man(7) page.

.\" ...which appeared in a few man pages distributed in 4.3BSD-Reno and
.\" 4.4BSD even though BSD was already transitioning to mdoc(7) by that
.\" time and did not support the macro.  SunOS 4.0.3 (May 1989)
.\" contained over 2,100 uses of .SB.

[2] https://lists.gnu.org/archive/html/groff/2017-05/msg00019.html

[3] CSTR #54 (1992) does mark the `it` request (ยง7, p. 17) with the "E"
    annotation for "relevant parameters" being "stored as part of the
    current environment", but I suppose one could argue that the
    presence of `\c` is a separate state bit.  And in CSTR #54, only
    requests, not escape sequences, enjoy documentation of their "E"
    environment-storage bit.

[4] You do have to call `br` with the regular control character.  `'br`
    won't do the trick.  Try it yourself with the following.

foo\c
'br \" change to .br as desired
bar
.pl \n(nlu

    Here's an idea.  At present `'br` is a no-op.  We could give it
    semantics of clearing the "previous line interrupted" (see `pev`
    request output) bit in the environment.  I wonder if there are any
    use cases for this.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]