bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to configure how groff hyphenates man pages (was: tctest.1 man page


From: G. Branden Robinson
Subject: How to configure how groff hyphenates man pages (was: tctest.1 man page hyphenation comments)
Date: Mon, 3 Jun 2024 21:05:46 -0500

[looping in groff list as this is something of a FAQ]

Hi Thomas,

At 2024-06-03T19:38:25-0400, Thomas Dickey wrote:
> Here's what I see with the 1.8 revision:
> 
> DESCRIPTION
>        tctest  exercises  the  termcap  library (or emulation of termcap) with
>        which it is linked.  It provides several command-line  options,  making
>        it  simple  to construct test-cases to compare implementations of term-
>        cap.
> 
> Call that overly-aggressive, then: it's predictable but reduces readability 
> :-)

Okay.  _Personally_, I think that "term-cap" is a reasonable hyphenation
break point.  In linguistic terms, it's both a morpheme boundary and a
syllabification point.

> I was probably also grumbling about nroff hyphenating "error" and "Repeat",
> i.e., 
>       "er-" "ror"
>       "Re-" "peat"
> It also split
>       "parameters" as "pa-" "rameters"
>       "obsolete" as "ob-" "solete"
>       "default" as "de-" "fault"

Yes.  The default hyphenation mode (even for English) is pretty
aggressive.  They're TeX's hyphenation patterns; we just live with them.
;-)

But you, the reader of a man page, do not have to; see below.

> In a quick check, it hyphenated 11 lines out of the 84 non-blank lines,
> and of those 11, 6 have 2 characters before the hyphen.  8 of the 11
> lines do have at least one place where there's a double-space.
> 
> Preventing it from splitting termcap reduced that to 10 lines.
> 
> (I'd rather the feature was configurable so that I could force it to
> keep at least 3 characters before/after the split)

It is, and a there a few methods of doing so, depending on how much
control you want to exercise.

groff_man(7):

     -rHY=0   Disable automatic hyphenation.  Normally, it is
              enabled (1).  The hyphenation mode is determined by the
              groff locale; see section “Localization“ of groff(7).

That is the most popular approach.  It's groff-specific, but causes no
harm elsewhere (it merely won't work; defining a register that some
other man(7) macro package pays no attention to damages nothing), and
has been in groff for a long time.

Authors
     The initial GNU implementation of the man macro package was written
     by James Clark.  Later, Werner Lemberg ⟨wl@gnu.org⟩ supplied the S,
     LT, and cR registers, the last a 4.3BSD‐Reno mdoc(7) feature.
     Larry Kollar ⟨kollar@alltel.net⟩ added the FT, HY, and SN
     registers; the HF string; and the PT and BT macros.

The `HY` feature dates back to 2003, and was included in groff 1.19
(April 2003).  This is so old that even old Mac OS X has it (before they
got rid of groff altogether in macOS Ventura).

What if you want finer-grained control over hyphenation?  For example,
what if you want hyphenation mode 8 instead of 4?  (And it sounds like
you, personally, do.)

groff(7):

Hyphenation
     When filling, groff hyphenates words as needed at user‐specified
     and automatically determined hyphenation points.
...

     Several requests influence automatic hyphenation.  Because
     conventions vary, a variety of hyphenation modes is available to
     the .hy request; these determine whether hyphenation will apply to
     a word prior to breaking a line at the end of a page (more or less;
     see below for details), and at which positions within that word
     automatically determined hyphenation points are permissible.
...

     8      disables hyphenation after the first two characters of a
            word.

This is an AT&T troff-compatible feature.  So, you can just put this in
your man.local file.  The groff_man(7) page's "Files" section documents
where this dwells.  On Debian systems, it's /etc/groff/man.local.

.hy 8

The foregoing approach sometimes gets overridden by man page documents
that attempt to seize control of hyphenation themselves, and do it
wrongly.  An approach that we managed to purge ncurses's man pages of
back in October was this.

.\" Text formatted with(out) hyphenation as configured by user.
.nh
.\" page content with hyphenation off
.hy
.\" page content with hyphenation ON, using mode 1,
.\" which is wrong for English,
.\" and enables automatic hyphenation
.\" even if the user doesn't want it at all.

That's why I submitted patches to take that stuff out.  It's nothing but
trouble.  In the forthcoming groff 1.24, a new feature permits GNU troff
to do something sane even in the face of the above--but it's an
extension, so should not stop anyone from ripping the foregoing bad
pattern out of their man page documents.[1]

If you wanted to be really scrupulous, and/or are in the habit of
reading man pages in multiple languages (but still don't altogether hate
automatic hyphenation), then you really want to do the foregoing only if
the man page is in English.  As of groff 1.23, you can guard the request
with a conditional that queries what the "groff locale" is.

.if '\\*[locale]'english' .hy 8

If you want more finely grained control, GNU troff offers numerous
extension requests to configure hyphenation parameters.  You can put
these in your man.local too.

Here's a sample from groff(7).

     .hlm       Set the consecutive automatically hyphenated line limit
                to -1, meaning “no limit”.
     .hlm n     Set the consecutive automatically hyphenated line limit
                to to n.  A negative value means “no limit”.
...
     .hym       Set the (right) hyphenation margin to 0 (the default).
     .hym length
                Set the (right) hyphenation margin to length (default
                scaling unit m).
     .hys       Set the hyphenation space to 0 (the default).
     .hys hyphenation‐space
                Suppress automatic hyphenation in adjustment modes “b”
                or “n” if the line can be justified with the addition of
                up to hyphenation‐space to each inter‐word space
                (default scaling unit m).

For a full exploration of hyphenation in GNU troff, here's the manual.

https://www.gnu.org/software/groff/manual/groff.html.node/Manipulating-Hyphenation.html

(Section "Hyphenation" of groff(7) presents the same material without
examples or footnotes.)

I would admonish man page documents AGAINST using any of the foregoing
requests, five nines of the time.  Employing them is not portable, and
frustrates user configuration.  In a man.local file, one is free to do
anything.

I thus encourage you to relax your stricture on the hyphenation of
"termcap" (again, except inside literals that might be copied and
pasted).

For those few mdoc(7) mavens who haven't smashed up the groff on their
system like the fax machine in _Office Space_ and moved all their
bookmarks over to mandoc(1) (which never automatically hyphenates), I
observe that all of the foregoing discussion applies just as well to the
mdoc.local file.  Back in 2020 I enhanced groff's mdoc package to
respect the `HY` register, and that change went out in groff 1.23.[2]

Regards,
Branden

[1] https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS#n50

    The foregoing line number will get stale with time.  Look for the
    text "hydefault".

[2] 
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=b443e713cecefc553fc8d98b68de74b750f54bd8

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]