groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[address@hidden: Re: [bug #59962] soelim(1) man page uses pic diagram--s


From: G. Branden Robinson
Subject: [address@hidden: Re: [bug #59962] soelim(1) man page uses pic diagram--should it?]
Date: Wed, 12 May 2021 15:46:03 +1000
User-agent: NeoMutt/20180716

Of course, having said I would redirect the discussion to groff@, I
forgot to do so.

Since there is some disputatious stuff in here, I reckoned I should make
it available for disputation.

----- Forwarded message from "G. Branden Robinson" 
<g.branden.robinson@gmail.com> -----

Date: Mon, 10 May 2021 09:38:51 +1000
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: Helge Kreutzmann <debian@helgefjell.de>
Cc: Mario Blättermann <mario.blaettermann@gmail.com>
Subject: Re: [bug #59962] soelim(1) man page uses pic diagram--should it?
User-Agent: NeoMutt/20180716

[redirecting to groff@ rather than bug-groff@, which isn't really a
discussion list; and dropping Dave from Cc list because I know he's
subscribed]

Hi, Helge!

At 2021-05-09T19:51:37+0200, Helge Kreutzmann wrote:
> Hello Branden,
> hello Dave,
> I hope switching to a much better medium for for discussion of
> fundamental issues is appropriate. I think the bug itself can be
> closed for now, the content below is out of scope of this bug.

Okay.  I'll review the ticket and probably close it accordingly.

> First of all, thank you very much for your detailed and long reply. It
> is highly appreciated.
> 
> I quoted only some parts, but this is to make this (long) e-mail more
> readable. It does not mean I valuae the rest of the text less.

No worries--thanks!

> On Tue, May 04, 2021 at 05:08:35PM -0400, Dave wrote:
> > Follow-up Comment #5, bug #59962 (project groff):
> >
> > [comment #4 comment #4:]
> > > 1. The headings of soelim are now all lower case instead of
> > > upper case (e.g. Name, Synopsis, Description instead of NAME,
> > > SYNOPSIS, DESCRIPTION)
> >
> > The rationale for this change is documented in the release's NEWS
> > <http://git.savannah.gnu.org/cgit/groff.git/tree/NEWS> file; see the
> > item beginning "The an (man) and doc (mdoc) macro packages support
> > new CS and CT registers..."  In short, you can preserve historical
> > all-caps section headings by passing "-r CS=1" to groff.
> 
> Thanks, I'll discuss with Mario if and how we can integrate this in
> our workflow.

If you have any difficulties, please loop us in.

> On Tue, May 04, 2021 at 07:10:34PM -0400, G. Branden Robinson wrote:
> > > 1. The headings of soelim are now all lower case instead of upper
> > > case (e.g.  Name, Synopsis, Description instead of NAME, SYNOPSIS,
> > > DESCRIPTION)
> > 
> > To further elaborate on Dave's helpful comment, all of groff's man
> > pages have migrated to mixed-case section [...] headings.
> > 
> > (And some day I'll get them all migrated to sentence case instead of
> > title case as well.)

(That day came sooner than expected.[1])

> I see three ways to look at this.
> 
> 1. From an esthetical point of view
>    If man pages were "ordinary" text, I agree having this all CAPS is
>    ugly. In the German translation even more so, because by default it
>    is supposed to be an abbreviation, we do not use (per general
>    rules) all CAPS text at all. Still, in the man pages both in the
>    original and the translation it is valid.

Man pages are, in some sense, ordinary text.  What many people overlook
is that groff, like AT&T troff before it, is a typesetting system.  Man
pages can be, and are, rendered to PostScript, PDF, and HTML.  What
constitutes an ergonomic aid on a character-cell terminal may become a
distraction or a hindrance in another output format.

This is one reason to make the capitalization of the section names
configurable; we can turn them on where they do good and leave them off
where they don't.  The man page source, however, should record the
mixed-case information in the event it is required, because there is no
(programmatic) way to recover it if it is discarded there.

> 2. From readers point of view
>    Both as reader as well as translator (which is a special kind of
>    reader) I do not look at the entire page at once, at least usually.
>    I want to navigate quickly to the point where I'm interested. Here
>    all CAPS help, because they are visually striking to enable quick
>    navigation. 

One thing I would mention is that less(1) supports regex searches within
its buffer.  On my system, the searches are even case-insensitive by
default if the search pattern is all lowercase, and not otherwise.  When
this section lettercase issue was being mooted on this mailing list, the
entrenchment of less(1) as part of the man page user experience was
noted as a consensus view.

Long story short, if a man page has an "Options" section heading, you
can find it in less with

/OPTIONS

(you don't even need to type the whole word) if the page uses full caps
for its section headings, and

/^Options

(again, the word can be truncated) if it does not.  The above pattern
takes advantage of the fact that section headings in man(7) are always
rendered started at the leftmost character cell.

In fact, to leap among sections you can do

/^[a-z]

regardless of the lettercase convention, and after doing the above once
you can type simply

/

to repeat the search or

?

to repeat it in the backwards direction.

less(1) even caches the search pattern (in $HOME/.lesshst), so you can
quit the program and use the pattern again the next time it runs just by
typing / and Enter.

>    I might be interested in the SYNOPSIS. I can quickly find (or
>    search) it. If "synopsis" is written elsehwere, I would not find it
>    (nor am I usally interested in those matches). Similarly if I look
>    for OPTIONS, or FILES, or … 
>    
>    So efficiency in using these documents (and their every repeating
>    same structure with the CAPS headings) is very helpful. It does not
>    matter how which programm - if it has a man pages, it most likely
>    follows man-pages(7). (Hopefully it follows man(7) at least in
>    spirit, see below). And I can quickly read and navigate. 

The foregoing is a good argument for configuring full-capitalization of
section names with CS in your environment.  For example, on my
Debian-based system I can do this by adding the following line to
/etc/groff/man.local.

.nr CS 1

Or maybe I like the full caps in my terminal emulators, but not in my
PDF viewer.

.if n .nr CS 1

The groff_man_style(7) page[2] includes a few examples of customizable
man page rendering parameters.  For too many years readers of man pages
have labored under the misconception that they have no influence over
how man pages are rendered on their system.  Most of these configurable
parameters have been supported by groff for over a decade.  Over the
past 4 years we've greatly improved documentation of these matters, and
extended the principle with new registers and strings (AD, CS, and CT).

Unfortunately, people with that misconception have sometimes gone on to
write tools that generate man pages as output, and force many stylistic
preferences on readers, in ignorance of the man page interface for
letting the user exercise individual preferences.

> 3. From a tools POV
>    Ideally, all tools would start from source. But often they don't.
>    Again, having a clear structure (like CAPS headings) makes tools
>    navigate in online manpages (plain HTML) or dump output generated
>    much better. Maybe AI will supersede this in the future, I don't
>    know.

We don't need AI for this; if a tool wants to scrape and parse a
formatted man page, they can ask groff to render it as they desire
first.  I think all three of your cases are manangeable with groff's
command line interface.

> > > 2. The markup is partially strange. Usually program names are in
> > > B<>, but now they are in I<>
[...]
> I think if I understood you correctly, the problem stems from the fact
> that visual and logical markup is intertwined to the point that
> visual appareance cause logical meaning. This is, by itself, of course
> bad.

Yup.  It is a problem dating back to the first man(7) implementation in
1979, and it has cast a long shadow.

> First, the rationale given above for the CAPS case applies. Users want
> efficient man pages.

...except they have different opinions about what constitutes
efficiency.

> So B<> and I<> are interwined with logical meaning for the reader. (I
> only look for bold text, because I'm looking through the names of
> options).

Yes; as I said previously, I think that "bold -> literal, italics ->
variable" is a sound rule of thumb.  What I reject is over-application
of the principle.  The dearth of font styling options in man(7), which
is ultimately attributable to an equal dearth on the Graphic Systems
C/A/T typesetter at Bell Labs in Murray Hill, New Jersey in the 1970s,
means that sometimes style choices are going to collide.  If someone
uses italics for emphasis, as in "do I<not> pass this function a pointer
to a buffer that is not known to be null-terminated", do we permit our
principles of efficiency to force us to regard the word "not" as some
kind of metasyntactic variable here?  No.

> This can of course be changed, but this would require some kind of
> coordination between all langauges for preparing man pages (groff,
> perldoc, docbook, …). In manpages-l10n we have > 100 projects at the
> moment, I did not do any statistics on their source language.

There's only so much semantic markup that it is wise to add to the
man(7) language.  There are two risks: first, that in doing so
conscientiously we will bloat the lexicon of the system beyond the point
where occasional man page authors can acquire it, say to the dimensions
of DocBook (which has hundreds of tags, according to one of our mailing
list's contributors), or even to the more modest scale of mandoc(7),
which has failed to take over the Unix world outside of the BSDs.

A more serious problem is that docbook-to-man has been chronically
unmaintained for 20 years.  As I noted a few years ago, it seems to
poison everyone who touches it.

The limit of my planned semantic revolution is introduction of the an
.MR macro for man page cross references; I think the payoff is large
there because people have violent opinions about how they should be
styled (hence the companion MF string I have suggested), and they enable
hyperlinking, a major benefit.  If you try to match on .IR or .BR you
will hit many, many false positives.

To be fair, I understate; another part of my planned semantic revolution
is arguably to regress it, by deprecating the .OP macro.  It's too
feeble to do its job--it cannot handle GNU-style long options with an
equals sign separating the flag name from its argument, and people
misuse it outside of synopsis macro pairs (SY/YS) (which it technically
predates; it seems to have made its way into Documenter's Workbench
troff at some point by the 1990s).

> Secondly as stated in 3. above readers and tools might not see the
> original markup language at all. For example in our case, the
> translators do not see all the details of the markup language, but
> often some mixed or intermediary representation. This is nice, because
> it releaves the translators from knowing all markup languages
> available. Po4a might not be perfect, but I think it does it's job
> quite well here.

Okay.  I acknowledge that.  What I am not seeing is, concretely, how
anything I have in mind detracts from your goal.

* Fully capitalized section headings are user-configurable.
* If I get .MR and \*[MF] implemented, you can have your tools render
  man page cross-references in bold if you like.

> And then translators often are forced to read (inperfect) texts which
> they have limited knowledge of. In contrast to ordinary users they
> might not use the tool or all options and have little or limited time
> (and interest) to "experiement". So I often look at the markup. Oh'
> it's B<> - then this is verbatim text, i.e. the name of an option. I
> must not translate this. Then I<> - this is variable, fine, I should
> translate it to serve our readers. This sometimes helps immensly in
> understanding and in the translation.

I can easily see that, and I don't want to break it.  I fully expect
some distributors to modify the copies of man.local they install to
impose a set of style rules, merrily discarding the case distinction
information I've gone to the trouble of adding to groff's own man pages.

> So translators use the visual markup as logical one. In this process I
> do not care how B<> is rendered in the end (nor for I<>). Of course
> readers later on would expect some consistent rendering, like in the
> CAPS case to quickly understand this (imperfect) text.

Right.  .MR and \*[MF] will get us closer to that, as the
mixed-convention problem is already in the wild.  The CS register
doesn't harm the cause, either, not only because most man pages already
have their section titles fully capitalized in the source, but because
.SH is _already_ a semantic macro, and by setting CS to 1 you can force
the heading to full caps even for pages written by people who don't know
anything about this register.

So, I think, groff 1.23.0 will make the world look _more_ like you want
it (for translation purposes, at least), not less.  groff 1.22.4 does
not manipulate section names in man pages at all.

> In theory this should occure, in practice this is quite often the
> case. I know man pages with *very* long sentences and here the markup
> is one of the tools to understand this sentences. 

Agreed.

> > The downside?  It will take many, many years for pages to migrate.
> > I expect we'll still be reading unadapted man pages when we're
> > discovering unsigned [recte: signed] 32-bit overflows in
> > "enterprise" Linux distributions in 2038.
> 
> If there is a goal and some kind of coordination I'm fine with having
> some migration period. manpages-l10n can actually help here - we
> regularly feed back errors to our upstream projects. So if we know how
> options names / fixed text should be marked and how variable text
> should be marked, then we could approach our upstreams (on a case by
> case basis) and ask them to apply this consistenly. In fact, I usually
> point them to man-pages(7), at least to the major points.

I strongly encourage you to check out groff_man_style(7).  I am in
communication with the Linux man-pages project (Michael and Alejandro).

> So please state the "right" convention and coordinate with as many
> other tools as well. Really please.
> 
> What would be horrible is that each project invents its own
> conventions, both for the readers and the translators.

That's where we are already, unfortunately.  My initiatives are an
attempt to remedy the problem.

> > It's on my to-do list to implement this, and maybe it will be in the
> > groff 1.23.0 release.  You can read more here
> > <https://lists.gnu.org/archive/html/groff/2020-08/msg00068.html>.
> > (Note that my observations in that message about font styling
> > practices have been modified as above per research I've done in the
> > intervening months.)
> 
> So please coordinate and disseminate the change to as many projects as
> possible and make it easy for them to switch to whatever is agreed
> upoon, and do so quickly.

In my experience people cannot be driven in this way, at least not by
someone who is seen as having a subjective investment in the issue.  My
role as implementor is to put together the best possible case I can in
terms of technology and ergonomics, but the driving of adoption is going
to have to be done by people whom I have already convinced, who then
become advocates within or closely affiliated with individual projects.

> > > (with an additional \\% at the beginning, but I don't complain
> > > about the latter).
> > 
> > That's the *roff hyphenation character.  At the beginning of a word
> > it suppresses hyphenation.  This is approximately a 50-year old
> > syntactical feature.  :)
> 
> As said, I don't speak *roff, I simply copy those codes not handled by
> po4a over. In Systemd all full stops have \\&. Probably better, but I
> simply do as upstream does. So no complaint from my side.

Okay.  It would probably behoove me to learn something about po4a; I've
heard about it many times over the years...

> > Underlining is to be expected; most terminal emulators do not
> > support italics, so the TTY output driver for groff tells the
> > terminal to use underlining instead.  See grotty(1) for more
> > information.
> 
> Thanks, I simply seldomly read the man pages in anything but VT, see
> below.

The only problem with this is that it may be obscuring an awareness of
what troff/groff are and what they are for.  They are typesetting
document formatters that have been applied to the problem of software
documentation in a visual context where a great many typesetting issues
do not arise.

I encourage you to experiment once in a while with rendering man pages
to PostScript, PDF, DVI, and HTML.  Not necessarily often; I simply want
you to be aware of some of the other design constraints that apply to
the man(7) language.

> > I confess that I go many months without checking man page rendering
> > in VTs.  But if this is something you do I am intensely interested
> > in any readability problems you encounter.
> 
> No problem, I'm probably highly unusal as I actually do most work on
> VTs (except those which nowadays require X, like graphics, most
> browsing). So I usually read the man pages in VTs as well.

I'm glad of this, as I feel this use case is under-tested with groff
users who are active on our mailing list.  (Perhaps I'm wrong, and they
will speak up to tell me so.)

> > > 3. Usually B<> is not additionally quoted, I noticed that it is
> > > now in the new version, e.g. \\[lq]B<...>\\[rq].
> > 
> > Again, this is a stylistic choice.  Bolded content sometimes becomes
> > ambiguous if the bold attribute is lost or stripped away, which can
> > happen when people use the man page interface clumsily (or render
> > man pages to an extremely primitive device).  My rule of man page
> > maintenance/authorship is to introduce quotes if such confusion is a
> > significant risk, and not otherwise.  Two rules of thumb are:
> 
> I do understand your reasoning, but in the set of man pages I worked
> in B<> is often by itself understood to be some kind of quoting, so
> this is unusual. I myself have not experienced output methods so dump
> to strip these away (bold and italics are now quite generally
> available, and as you state, underline is available as an
> alternative). 

There are two common ways this can happen.

The first is when a person takes the somewhat widespread advice to tell
grotty(1) to disable SGR escapes with the GROFF_NO_SGR environment
variable, and then just renders a man page to the terminal with groff
without paging it.  (Who needs a pager when you've got a scroll bar in
your terminal emulator, right?)  Both bold and underlining vanish.

$ zcat $(man -w ls) | GROFF_NO_SGR=1 groff -Tutf8 -man

If you further pipe the above through more(1), italics come back but not
bold (on xterm version 344 and some other versions).

The second way is when someone copies and pastes a rendered man page.
The text attributes do not come along for the ride.

Thus, personally, I strive to achieve a discipline which says that the
user should have a fighting chance to comprehend a man page even if it
is rendered in attribute-less US-ASCII.  Hence my rules of thumb, which
you snipped.  :)

I have worked on embedded systems and occasionally found myself reading
an unfamiliar man page in a very limited environment[3], where not
everything is working to my taste.  Vendors of embedded systems screw
all kinds of things up, especially Unix terminal I/O.  This sort of
situation arises in high-stress circumstances.  To me, a little extra
effort toward good typographical style is worth it to keep a hacker on a
deadline from stroking out.

Admittedly, in pursuing this aim, I've educated myself in *roff (and the
man(7) language specifically) to the point where I can read unrendered
man page sources in detail with ease[4], so if I'm ever in that situation
again I'll just cat the damn page out directly (or sed it out in
chunks).  But not everyone will take the time to do as I have done.  (I
expect most will try to go find the same man page and view it on their
desktop system, only to discover that the _exact_ version of the page
they need can't be located...)

> For me, this looks strange, but again as translator I will faithfully
> reproduce this in the translation as long as it does not contradict
> German gramar heavily. Often I actually add quotes when upstream does
> not use any or only little markup. (The tendency I feel is that
> literal quotes become somewhat unpopular in english texts, e.g. in
> systemd they were drastically reduced some time ago). 

I think a major driver behind this is that people either don't
understand typographical quotation or how to achieve it in a man page.
I've documented this more prominently, too--see groff_char(7)[5].  It
has an example dedicated specifically to quotation marks.

> > > 5. The line graphics was changed. On my system, the arrows are
> > > displayed fine in a VT, but not the lines. Both display fine in a
> > > KDE konsole. Maybe you want to keep the previous line and just use
> > > the arrows? (But most users probably will use a terminal program
> > > in a graphical environment, so this really is very minor)
> > 
> > I will check this out.  I'm a bit loath to change the input because
> > it is correct with respect to my present understanding.  What I
> > think is happening is that the VT driver has no glyphs for the
> > Unicode box drawing characters, so it doesn't render them.
> 
> Probably, and it is extrem low priority, so consider it as such. 

Okay.  I'm having trouble getting my shells on VTs to _not_ start with a
UTF-8 locale, and so I get moji-Latin-1-bake all over my screen.  I'll
eventually troubleshoot it.

As a guess, there are some unplumbed depths here involving the DEC ACS
(alternate character set), which the Linux VT has long supported.
grotty(1), however, has no awareness of such a thing; it knows only
character encodings, not the old DEC mechanism that allowed switching
out chunks of the character encoding space.  But even if that is true,
grotty(1) should be falling back to -, |, and + characters for drawing
boxes.  If it is not, that is irksome.

> I think I understand the problem. I'm not against change. But please
> keep the aim of the man pages and their consistency in mind, i.e. if a
> transition is needed, please do your best to make it as smooth as
> possible (e.g. by telling everyone including po4a and manpages-l10n
> about the new right way).

I hear you.  I am trying to enable gradual transitions, with no flag
days or major disruptions.  I would encourage you, and other members of
those projects, to subscribe to the groff at gnu dot org mailing list if
you have time.

> If I should/need to add anything to the actual bug, please tell me and
> I will of course.

Will do.

Thanks very much for the discussion.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff-commit/2021-05/msg00029.html
[2] https://man7.org/linux/man-pages/man7/groff_man_style.7.html
[3] "Not very embedded," you may observe, "if they left man pages in the
    root fs on the flash ROM."  I would agree.
[4] Though docbook-to-man output is still so hateful to the eye that it
    can cause corneal abrasions.
[5] https://man7.org/linux/man-pages/man7/groff_char.7.html



----- End forwarded message -----

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]