groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: *roff hyphenation trivia challenge


From: Steve Izma
Subject: Re: *roff hyphenation trivia challenge
Date: Tue, 2 Apr 2024 15:22:53 -0400

On Tue, Apr 02, 2024 at 01:29:05PM -0500, G. Branden Robinson wrote:
> Subject: Re: *roff hyphenation trivia challenge
> 
> At 2024-04-02T13:42:59-0400, Steve Izma wrote:
> > On Tue, Apr 02, 2024 at 06:51:51PM +0200, Tadziu Hoffmann wrote:
> > > Subject: Re: *roff hyphenation trivia challenge
> > 
> > > For "antidisestablishmen\%tarianism", groff prints
> > > 
> > >   antidisestablishmen-
> > >   tar-
> > >   i-
> > >   an-
> > >   ism
> > > 
> > > (which I think is strange), while TeX and Heirloom troff print
> > > 
> > >   antidisestablishmen-
> > >   tarianism
> > > 
> > > which I think is the only reasonable way of handling this case.
> > 
> > I disagree.
> 
> Oops. I misread Tadziu's example, and hallucinated a leading `\%` in it.
> 
> If there is no _leading_ `\%`, then infixed `\%` escape sequences can
> only add hyphenation points; they cannot remove them.  AIUI.

Hi Branden,

Thanks for the response. But I'm not clear about your comment
here. I get the same results as Tadziu, i.e., the hyphenation
points prior to the \% disappear. And now, testing with

printf '.ll 1n\n\%antidisestab\%lishmentarianism\n' | nroff -Wbreak | cat -s

I get the same results:

antidisestab‐
lish‐
men‐
tar‐
i‐
an‐
ism

This seems to mean that the function of a leading \% only works
until a subsequent \% -- but then the behaviour is the same even
without a leading \%. In decades of using groff I've never
noticed this. It's a good thing you've started this discussion.

> > Also for \% at the beginning of a word, I rarely use this.
> 
> I use it frequently in man(7) documents, because the `hw` request is not
> portable/reliable (in theory).  Also there's no mechanism for removing
> these, so if we tolerate/encourage their use, doing so deals a blow to
> reliable/predictable batch rendering.[1]

Good point.

> So let me amend my claim.
> 
> I think it's weird that
> 
> > > [f]or "antidisestablishmen\%tarianism", groff prints
> > > 
> > >   antidisestablishmen-
> > >   tar-
> > >   i-
> > >   an-
> > >   ism
> 
> whereas
> 
> $ printf '.ll 1n\nantidisestablishment\n' | nroff -Wbreak | cat -s
> an‐
> tidis‐
> es‐
> tab‐
> lish‐
> ment
> 
> seems like well-behaved formatting to me.
> 
> ...except for the lack of a break point after "ti", of course.
> But I'm comfortable assuming that the discrepancy here is a
> limitation of the TeX hyphenation system aggravated by
> English's polyglot morphology.

Since most of my use of groff for books over the last thirty
years has been non-fiction (mostly scholarly) material, much of
the terminology used doesn't end up in hyphenation lists --
sometimes the words are just too new or rare. The same applies to
the preponderance of proper names in scholarly material. Often
most hyphenation points were correct but, especially for long
words, a point that would make all the difference towards getting
a properly spaced line would be missing, as above with "ti-dis".
That's why it's convenient to use \% to add to hyphenation points
that arise from hyphenation logic as opposed to exception lists.

But now that I think about it, we would often prefer to use .hw
in these cases because that allows you to define only what is
desireable. I should really go back through my various book
projects and do some research here.

> Is TeX's hyphenation algorithm defeated by the pathological case of
> "antidisestablishmentarianism", and groff's implementation of it
> "recovers" differently?

I don't remember enough about TeX to answer this. I used TeX and
LaTeX up to about 15 years ago to typeset about 20 books from
computer science conferences and the oversetting of lines caused
by the periodic failure of the paragraph-justification algorithms
drove me nuts. That was not due to hyphenation problems, but
something to do with limits to word-spacing that I probably
didn't understand properly. The many lines that overset by only a
few points made proofreading really difficult. That's why I'm
suspicious of trying to add or replicate these algorithms in
groff.

        -- Steve

-- 
Steve Izma
-
Home: 35 Locust St., Kitchener, Ontario, Canada  N2H 1W6
E-mail: sizma@golden.net  cellphone: 519-998-2684

==
The most erroneous stories are those we think we know best – and
therefore never scrutinize or question.
    -- Stephen Jay Gould, *Full House: The Spread of Excellence
       from Plato to Darwin*, 1996



reply via email to

[Prev in Thread] Current Thread [Next in Thread]