groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Soft hyphens


From: G. Branden Robinson
Subject: Re: Soft hyphens
Date: Wed, 7 Apr 2021 14:59:15 +1000
User-agent: NeoMutt/20180716

Hi, Peter!

At 2021-04-04T13:05:11-0400, Peter Schaffter wrote:
> On Sun, Apr 04, 2021, G. Branden Robinson wrote:
> > I think the sentence
> > 
> >    Explicitly hyphenated words such as "mother-in-law" are eligible for
> >    breaking after each of their hyphens when GNU 'troff' fills lines.
> > 
> > does most of the work you're asking for...
> 
> It does, but for clarity and and completeness, the sentence must
> mention soft hyphens.

I think I solved this problem, but not the way you suggest.  The reason
is that I'm uneasy with your use of the term "soft hyphen"; I don't
think it's necessary to explain most of groff's hyphenation behavior.

Your concern did however make it clear to me that our documentation was
making fuzzy a distinction between two different activities that can
both be called "automatic":

(1) the determination of hyphenation points within a word; and
(2) the selection of a hyphenation point for breaking when filling.

A word may have many hyphenation points (whether determined by
hyphenation pattern files or "manually" using \% or .hw), but will be
broken during formatting at most once[1].

> A word with a soft hyphen cannot be said to be "explicitly
> hyphenated"; the nature of a soft hyphen is that it's optional, not
> explicit.  Even if it can be argued that a soft hyphen is explicit
> because it's introduced by the user, the potential for
> misunderstanding warrants clarification.

I agree that there is a potential for misunderstanding here and I have
attempted to rectify it, albeit not with precisely the same lexicon
you're using here.

For me, and in usage I probably picked up from our Texinfo manual in the
first place, the term "soft hyphen" has only two uses:

(A) The ISO Latin-1 character which gets translated on input to a
    hyphenation character (\%); and
(B) A name for the glyph, configurable with the .shc request, that is
    interpolated into the output when hyphenating a word that isn't
    already being broken at an explicit hyphen.

> Something like
> 
>   Explicitly hyphenated words such as "mother-in-law" or
>  "brother\%hood" are eligible for breaking after each of their
>   hyphens...
> or
>   Explicitly hyphenated words such as "mother-in-law" and words
>   containing the soft hyphen character are eligible for breaking
>   after each of their hyphens...
> 
> would be more useful than requiring readers to extrapolate (a
> Very Bad Thing in documentation) that the discretionary hyphen
> counts as an explicit hyphen.

Taken in isolation, you're right that the sentence leaves things unsaid.
But we can't say everything in one sentence, and I think the context
makes things clear, at least after the updates I've now pushed.

  5.8 Manipulating Hyphenation
  ============================

  GNU 'troff' normally hyphenates words where necessary.  The
  machine-driven determination of hyphenation points in words requires
  algorithms and data, and is susceptible to conventions and
  preferences.  Before tackling such "automatic hyphenation", let us
  consider how hyphenation points can be set manually.

     Explicitly hyphenated words such as "mother-in-law" are eligible
  for breaking after each of their hyphens when GNU 'troff' fills lines.
  Relatively few words in a language offer such obvious break points,
  however, and automatic hyphenation is not perfect, particularly for
  unusual words found in technical literature.  We may wish to instruct
  GNU 'troff' how to hyphenate specific words if the need arises.

> Equally, the corresponding entry for .nh needs to include that
> explicit and soft hyphens continue to be interpreted as valid break
> points.  Conceptually, soft hyphens feel like part of automatic
> hyphenation, even if groff doesn't treat them as such.  Users need
> to be alerted in order to prevent surprises.

I have ensured that such annotations are in the descriptions of .hy and
.nh now, both in our Texinfo and groff(7).

There is one fiendish little bit of non-orthogonality in *roff
hyphenation modes, in faithful simulation of AT&T troff.  Value 2 of the
.hy request is the only one that applies to words with manual
hyphenation points; that's the value that prohibits hyphenation of a
word at the bottom of the page.  Its function is reasonable but the
breadth of its application makes it impossible to say that ".hy deals
only with automatic hyphenation" or ".hy affects only words with
automatically determined hyphenation points".  Instead I have to hedge
around such a general statement.  Regardless, the fact is now more
clearly documented, IMO.

I'm attaching a PDF of the 8 pages of in-depth hyphenation documentation
as it currently stands in Git HEAD.  I'd appreciate your review, and
that of anyone else.  Recent changes have affected the content only up
to the .nh request (inclusive); but since groff 1.22.4 I think every
paragraph has undergone revision.

Regards,
Branden

[1] The tail of the distribution may be longer than this, especially in
multi-column documents in German, but I think the statement captures at
least 99% of hyphenation cases I've seen in *roff documents.

Attachment: groff-98-105-pdfjam.pdf
Description: Adobe PDF document

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]