groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Leaders


From: G. Branden Robinson
Subject: Re: Leaders
Date: Sat, 2 Sep 2023 17:11:24 -0500

[self-follow-up]

Hi Peter,

On second thought--in the form of empirical measurements rather than
hand-waving conjecture--I think you've found a bug.

Consider the following input, which I've degraded to an ASCII spelling
of "chateau" to permit some comparative analysis (keep reading).

$ cat long-leader.roff
.ta \n(.lu
.nf

a
air
.fi
The chateau into which my valet had ventured to make forcible
entrance, rather than permit me, in my desparately wounded
.sp
entrance, rather than permit me, in my desparately wounded

There are some hidden Control+A (leader) characters in the above.  Let's
expose them.

$ hd long-leader.roff
00000000  2e 74 61 20 5c 6e 28 2e  6c 75 0a 2e 6e 66 0a 01  |.ta \n(.lu..nf..|
00000010  0a 61 01 0a 61 69 72 01  0a 2e 66 69 0a 54 68 65  |.a..air...fi.The|
00000020  20 63 68 61 74 65 61 75  20 69 6e 74 6f 20 77 68  | chateau into wh|
00000030  69 63 68 20 6d 79 20 76  61 6c 65 74 20 68 61 64  |ich my valet had|
00000040  20 76 65 6e 74 75 72 65  64 20 74 6f 20 6d 61 6b  | ventured to mak|
00000050  65 20 66 6f 72 63 69 62  6c 65 0a 65 6e 74 72 61  |e forcible.entra|
00000060  6e 63 65 2c 20 72 61 74  68 65 72 20 74 68 61 6e  |nce, rather than|
00000070  20 70 65 72 6d 69 74 20  6d 65 2c 20 69 6e 20 6d  | permit me, in m|
00000080  79 20 64 65 73 70 61 72  61 74 65 6c 79 20 77 6f  |y desparately wo|
00000090  75 6e 64 65 64 01 0a 2e  73 70 0a 65 6e 74 72 61  |unded...sp.entra|
000000a0  6e 63 65 2c 20 72 61 74  68 65 72 20 74 68 61 6e  |nce, rather than|
000000b0  20 70 65 72 6d 69 74 20  6d 65 2c 20 69 6e 20 6d  | permit me, in m|
000000c0  79 20 64 65 73 70 61 72  61 74 65 6c 79 20 77 6f  |y desparately wo|
000000d0  75 6e 64 65 64 01 0a                              |unded..|

Look for '01' bytes.

Now the output.

$ nroff EXPERIMENTS/long-leader.roff | cat -s
.................................................................
a................................................................
air..............................................................
The chateau into which my valet had ventured to make forcible en‐
trance, rather than permit me, in my desparately wounded......

entrance, rather than permit me, in my desparately wounded.......

This mostly looks great, except, uhh, the line after the hyphenated
"entrance".  What gives?

Well, let's try it with DWB 3.3 nroff, my usual dodge for avoiding the
tedium of starting up SIMH to run a PDP-11, pinning a core to the wall
busy-waiting on terminal input 200 times faster than the original
hardware, and hearing my laptop's fans spin up as its lifespan dwindles.

$ cd ~/dwb
$ DWBHOME=. ./bin/nroff ./long-leader.roff | cat -s
a................................................................
air..............................................................
The chateau into which my valet had  ventured  to  make  forcible
entrance, rather than permit me, in my desparately wounded.......

entrance, rather than permit me, in my desparately wounded.......

Hmm, DWB nroff doesn't hyphenate.  Let's force it, spelling "entrance"
as "en\%trance".

$ DWBHOME=. ./bin/nroff ./long-leader.roff | cat -s
a................................................................
air..............................................................
The chateau into which my valet had ventured to make forcible en-
trance, rather than permit me, in my desparately wounded.......

entrance, rather than permit me, in my desparately wounded.......

Okay.

And its descendant Heirloom Doctools does the same, unsurprisingly.
(It doesn't need the manual hyphenation break point, but it doesn't
hurt.)

$ cd ~/heirloom
$ ./bin/nroff ./long-leader.roff | cat -s
.................................................................
a................................................................
air..............................................................
The chateau into which my valet had ventured to make forcible en-
trance, rather than permit me, in my desparately wounded.......

entrance, rather than permit me, in my desparately wounded.......

It _seems_ that hyphenation messes up the formatter's idea of how much
of the line has already been output when it comes to measuring the
distance to the next tab stop.  Notice how our leader is short 2 ens,
which is exactly how much of "entrance" we lost to a hyphenation break.

Let's respell "entrance" as "\%entr\%ance" and see what happens.
Nothing at first.  It won't fit.  So let's change "valet" to "Val".

$ ./bin/nroff ./long-leader.roff | cat -s
.................................................................
a................................................................
air..............................................................
The chateau into which my Val had ventured to make forcible entr-
ance, rather than permit me, in my desparately wounded.......

entrance, rather than permit me, in my desparately wounded.......

DWB, Heirloom, and GNU nroff all behave the same in this case.[1]
Hypothesis apparently confirmed.

I am having difficulty justifying (SWIDT?) this behavior, but it is
indisputably of long pedigree.

Looks like I'll be firing up SIMH after all.

Will advise.

Regards,
Branden

[1] Not quite.  I can say "\%entr\%ance" in GNU nroff but have to say
    "entr\%ance" in DWB and Heirloom nroff.  In GNU troff, `\%`
    _supplements_ any existing hyphenation break points in a word, and
    only a leading `\%` will suppress those.  In AT&T troff, it seems,
    the use of `\%` anywhere in a word would suppress all hyphenation
    points except those explicitly specified.  This would seem to demand
    further experimentation.  And Seventh Edition Unix as a test bed yet
    again, to confirm that USG didn't screw up troff.

    As yet another wrinkle, using `.hw entr-ance` instead of `\%` works
    on GNU and Heirloom, but not DWB.  Oy vey.  I can't account for DWB
    not honoring that hyphenation exception when the word plainly fits
    on the line if broken there.  That seems full-on buggy rather than
    an implementation detail.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]