groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: .TQ to replace .PD 0


From: Ingo Schwarze
Subject: Re: .TQ to replace .PD 0
Date: Tue, 24 May 2022 15:22:30 +0200

Hi Branden,

G. Branden Robinson wrote on Tue, May 24, 2022 at 12:57:08AM -0500:
> At 2022-05-24T04:44:21+0200, Ingo Schwarze wrote:

>> Your version (with .PD) has the clear advantage that it is more
>> portable: it is likely to work on any man(7) implementation,
>> whereas .TQ might fail on implementations that are neither
>> groff nor mandoc.

> There's one wrinkle with `PD`, as noted in a thread from a couple of
> weeks ago.
> 
> https://lists.gnu.org/archive/html/groff/2022-05/msg00019.html
> 
> Briefly, man-db man(1) squeezes multiple blank lines to one when using
> nroff to render the page.  That caveat doesn't apply to examples shown
> in this thread,

Indeed.  This thread is only about ".PD 0" and .PD without an argument,
not about .PD with any other argument.  Actually, in a manual page,
i would consider using .PD with any argument other than zero bad
style, on par with using the .sp request or the \s escape sequence
in a manual page.  The reason why i consider it bad style is that
a .PD macro with a different argument makes detailed presentation-
level demands, which a manual page should refrain from.

Outside example code and example output, i fail to see any situation where
it might make sense to have two consecutive blank lines in a manual page.
For examples, you should always use .nf or a macro implying .nf like .EX
or .Bd -unfilled in the first place, and then, i think it is better to
explicitly put multiple blank lines as needed rather than obfuscating
their generation with .PD or .sp.  If an example needs more than two
or three consecutive blank lines, the example is likely ill-designed
because the difference between, say, five and six blank lines is hard
to see for the human eye, so the topic should likely be explained in a
completely different way that doesn't invite the reader to hold a ruler
to their monitor in order to see how many blank lines there are.

> but it is a limitation we don't discuss in groff_man(7)
> (because we don't impose it) and it is not how historical
> implementations behave.

I agree groff_man(7) would be the wrong place for discussing "more -s".

> man-db man(1) has done this for a long time; as
> far back as 2001, it was calling the pager program with the "-s" flag,
> but since 2015 it has done the squeezing itself.

Assuming that is true, that sounds like an outright bug in man-db
to me.  Even though that isn't needed often, there can be legitimate
reasons why a particular manual page might have to display two
consecutive blank lines in an example, and then the squeezing
breaks the content of the manual page, not just the formatting.

If a user deliberately sets MORE=-s or something similar for other pagers
in their environment, that is a different story: it is their choice and
they get what they pay for.  But doing this behind the user's back looks
like a bug to me.

> Admittedly, that wrinkle shouldn't come up often.  You can still space
> paragraphs however you want when formatting for typesetter devices, and
> the most common idiom I've seen for `PD` use is the one discussed in
> this thread--setting the spacing to zero and then restoring it.

>> My version has the (doubious) advantage that it's a bit less
>> presentation-oriented and a bit more indicative of semantics.
>> How much that is worth in a language like man(7) that is almost
>> entirely presentation-oriented and provides very little semantic
>> markup in the first place is open for debate.

> Let's not forget that `TH`, `SH`, and `SS` are all semantic,

I would call those "structural" rather than "semantic".  I consider
distinguishing four kinds of markup useful:

1. presentational: manipulating indentation, spacing, fonts etc.,
   for example .RS, .PD, .B, .I, .BI, .BR, .DT, .Em, .Sy, .No, .Ns, ...
2. structural: titles, sections, paragraphs, lists, displays etc.
   for example .TH, .SH, .SS, .PP, .TP, .EX, .Dt, .Pp, .Bd, .Bl, .Dl, ...
3. semantic: conveying information about the meaning of terms employed
   by the facility that is the *topic* of the manual (as opposed to
   information about the structure of the *documentation*),
   for example .MT, .UR, .Fl, .Ar, .Ic, .Fn, .Fa, .Er, .Lk, .Mt, ...
4. text production: macros generating fixed strings,
   for example .AT, .UC, .Ex, .Rv, .St, ...

So in my book, the only semantic macros in man(7) are .MT and .UR.

> and have
> paid us the tremendous benefit of being easily converted to PDF
> bookmarks (and HTML anchors, though grohtml has some issues still to be
> resolved).  By frequency of use these are far more important than either
> `PD` or `TQ`.  Like HTML 1.0, the man(7) language mixes the
> presentational and semantic domains.  They also illustrate the inertial
> challenge to unscrambling the mixture, I'll grant.

>> I tend to think portability is possibly more important than the fact
>> that my version with .TQ looks minimally nicer.  So if somebody wrote
>> a new manual page in man(7), i would probably recommend your version
>> with .PD rather than mine with .TQ.

> Another, as yet only potential, advantage to using `TQ` is that it is
> much more amenable to indexing and tag inference.  (You could still do
> it by parsing a man page and tracking enough state to see `.PD 0` and
> `.PD` pairs come and go, but it would be more tedious.)  That's one
> reason I embraced the macro when I learned about it.  Many people have
> seen pod2man(1) litter man pages with non-standard `IX` macros.  I think
> that was a poor execution of a good idea.  Certainly mandoc(1)+mdoc(7)
> can boast of powerful tag recognition and search facilities.

Looking into the source file mandoc/man_validate.c, i see that mandoc(1)
treats .TP and .TQ identically for the purposes of tag recognition
and also that .PD is entirely ignored for the purposes of tag
recognition, so my impression is that for the purposes of tag
recognition, both forms (.TQ and the .PD+.TP combo) are of exactly
the same quality.  So tag recognition doesn't appear to provide an
angument for one or the other.

> Paragraph tags, if used as recommended by groff_man_style(7), will
> practically always be the sorts of key words or phrases one wants to
> search for, or see compiled into an index (of command-line options, for
> instance).

I agree, and mandoc(1) tag recognition assumes exactly that.

> The fact that `TQ` identifies a tag that should be grouped
> with the previous `TP` tag allows inferences to be drawn about their
> structuring for indexing or tag-relation purposes.

I doubt that matters.  A tag (and i think indexing means essentially
the same) merely points to a place in the text, so it doesn't matter
whether in the csh(1) example discussed earlier, you regard "cd" and
"chdir" as two different commands or as two spelling variants of the
same command.  Either way, both tags point to the respective keywords,
so they point to almost exactly the same point in the text right before
the relevant description.  No magic distinction between .TP and .TQ is
needed for that, and .PD does not hinder that.

> (By "if used as recommended", I mean that, for best results,  people
> shouldn't use asterisks/bullets/list enumerators as `TP` paragraph tags.
> The same effect can be achieved with the tag argument to the `IP` macro
> without muddling the "tag space".)

Actually, mandoc(1) uses the first argument of the .IP macro (if any)
in almost exactly the same way as the line after a .TP macro.
To avoid indexing bullets, it does not consider whether the author
used an .IP or .TP macro but instead inspects the head content of
that macro, skipping leading whitespace, font escapes, dashes, and
backslashes, then generating a tag if and only if the first
character after all that isalpha(3).  That certainly isn't perfect,
but usually avoids tagging bullets even when they appear in .TP
and it often manages to generate useful tags even if an author
used .IP for a tagged list.

That said, i do agree that it is best style to use .TP for tagged
list and .IP for bullet lists and not the other way round.

> I have ideas for leveraging this potential that I want to pursue after
> the 1.23 release.
 
> [...]
>> In either case, i think the best answer to Alejandro's original
>> question is: if you value portability, which you probably should,
>> using .PD is better than using .TQ, the reason being that the
>> gain in beauty from .TQ is small and the portability risk in
>> using it is real.

> Magnitudes and use cases are important.  It would be nice to know, even
> approximately, which systems _don't_ support `TQ` and other groff man(7)
> extensions, and how prevalent those systems are.

Well, any Linux, BSD, and Illumos system is almost certainly fine
unless the user configures it in some very unusual way.  That
basically leaves us with two situations where trouble might ensue:
some commercial UNIX distributions, and individual users who
deliberately use a different roff(7) implementation (including for
manual page display) than their system default.  I have no idea
how to measure the frequency of the latter, but i guess it is
very rare.  Regarding commercial UNIX systems, i do not have
access to any of them except Solaris.

For what it's worth, as far as i'm aware, even the latest
Oracle Solaris 11 neither provides mdoc nor man-ext.

> Without some notion of
> that, we can by the same token point people to the Unix Version 7 man(7)
> page and tell them to use only that and nothing else--not `TQ`, not
> `UR`, not `P`, `AT`, or `UC`,

I beg to differ between .TQ and .UR.

If a macro is extremely useful and there is no portable way to
achieve its effect, using that macro and sacrificing a bit of
portability makes sense even when a manual page author values
portability in general.  My opinion is that .UR and .MT are
macros of that quality.

But i doubt .TQ is such a case.  Its benefit is rather small,
and the same effect can easily and portably be achieved with .PD.

> and certainly not mdoc(7), which is a
> decade younger, and as far as I know (maybe you do), never made it into
> many commercial Unixes descended from System V at all.
> 
> That's an extreme application of your principle that I don't expect
> anyone to seriously adopt.

I do take that seriously.

If people publish portable software, i recommend using mdoc(7), let
the build system use "mandoc -T man" to generate man(7) versions of
the authoritative mdoc(7) documents, include the man(7) versions into
the distribution tarball in addition to the mdoc(7) files, and let the
./configure script on the target system detect whether the target system
supports mdoc(7), and if not, install the man(7) versions instead of
the mdoc(7) versions.

That way, you get full mdoc(7) markup and searching power on almost all
systems (especially on practically all free operating systems) and yet
remain fully portable to those commercial systems not providing mdoc(7).

Actually, using mdoc(7) in the way described above is *more* portable
than using man-ext: if a man(7) page uses man-ext extensions, i'm
not aware of any practically useful way to make it work on a target
system not providing man-ext.  The notorious recommendation to copy
the complete man-ext macros into each and every manual page seems
incredibly stupid to me and was never properly thought through.
It is unsustainable on so many levels that i won't even start
enumerating them.

> But I think the best barrier against sliding
> into it is to collect some data and apply quantitative reasoning.
> Do you have any suggestions?

What exactly would be the benefit of a complete list of commercial
UNIX systems, stating for each whether it supports man-ext per
default, as an option, or not at all?  We already know most
systems do support it and at least one important system does not.

> Hypothetically, we could have groff's
> 'configure' script analyze the host system for this sort of thing.

And then do what?  We certainly don't want a ./configure script
to phone home to us and disclose information about the system
it is running on.  Many users would consider that a breach of
privacy.

Or do you hope for users to voluntarily send the results of such
analyses to us?  Good luck with that.  Yes, before releasing mandoc,
i always test on Solaris 9, 10, and 11, and i occasionally (but
rarely) got reports from AIX users in the past, and once even
from an IRIX user.  But there was never anything from any other
commercial system as far as i recall.  Not even from HP-UX unless
i forgot about it.

My recommendation is simple: extend the language when that provides *very*
significant benefit, and *only* if the benefit is really very significant,
ideally in ways that don't totally break the output on systems not
supporting the new feature.  When the benefit would only be moderate,
prefer portability instead.  When the benefit would be small, don't even
think about it.

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]