groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Soft hyphens


From: G. Branden Robinson
Subject: Re: Soft hyphens
Date: Sun, 4 Apr 2021 15:56:45 +1000
User-agent: NeoMutt/20180716

Hi, Peter!

At 2021-04-03T12:31:38-0400, Peter Schaffter wrote:
> On Sat, Apr 03, 2021, Dave Kemper wrote:
> > On 3/28/21, Peter Schaffter <peter@schaffter.ca> wrote:
> > > I'm wondering if the interpretation of soft hyphens when .nh is
> > > active is correct behaviour.
> > 
> > I don't know the answer to your question,
> 
> I got an answer from Doug McIlroy.
> 
>  "In pre-Unix roff hyphenation mode 0 turned off all breaking of words.
>   The original troff, however, behaved as described above, and also
>   broke genuinely hyphenated words in mode 0."
> 
> > but in general, anything that will require some kind of change
> > --even if it's presently unknown whether that change is to the
> > software or to the documentation--should have a savannah ticket
> > opened (http://savannah.gnu.org/bugs/?group=groff&func=additem) to
> > keep tabs on it.
> 
> From what Doug said, I don't think it qualifies as a bug, more of
> an idiosyncracy.  I do think it needs to be documented in the info
> manual, though.  I've opened a ticket and assigned it to Branden.

Here's the current text of our Texinfo manual on this subject, before it
gets into the gory details of requests (.hc, .shc, .hy, .nh, .hpf,
.hpfa, .hpfcode, .hcode, .hla, .hlm, .hym, .hys).  Much of this content
is new or heavily revised since groff 1.22.4.

I think the sentence

   Explicitly hyphenated words such as "mother-in-law" are eligible for
   breaking after each of their hyphens when GNU 'troff' fills lines.

does most of the work you're asking for...but if you disagree please
speak up.

[...]
5.1.3 Hyphenation
-----------------

When an output line is nearly full, it is uncommon for the most recent
word collected from the input to exactly fill it--typically, there is
enough room left over for part of the next word.  The process of
splitting a word so that it appears partially on one line (with a hyphen
to indicate to the reader that the word has been broken) with the
remainder of the word on the next is "hyphenation".  GNU 'troff' uses a
hyphenation algorithm and language-specific pattern files (based on but
simplified from those used in TeX) to decide which words can be
hyphenated and where.

   Hyphenation does not always occur even when the hyphenation rules for
a word allow it; it can be disabled, and when not disabled there are
several parameters that can prevent it in certain circumstances.  *Note
Manipulating Hyphenation::.
[...]
5.8 Manipulating Hyphenation
============================

GNU 'troff' hyphenates words automatically by default.  Automatic
hyphenation of words in natural languages is a subject requiring
algorithms and data, and is susceptible to conventions and preferences.
Before tackling automatic hyphenation, let us consider how it can be
done manually.

   Explicitly hyphenated words such as "mother-in-law" are eligible for
breaking after each of their hyphens when GNU 'troff' fills lines.
Relatively few words in a language offer such obvious break points,
however, and automatic hyphenation is not perfect, particularly for
unusual words found in domain-specific jargon.  We may wish to
explicitly instruct GNU 'troff' how to hyphenate words if the need
arises.

 -- Request: .hw word ...
     Define each "hyphenation exception" WORD with each hyphen '-' in
     the word indicating a hyphenation point.  For example, the request

          .hw in-sa-lub-rious alpha

     marks potential hyphenation points in "insalubrious", and prevents
     "alpha" from being hyphenated at all.

     Besides the space character, any character whose hyphenation code
     is zero can be used to separate the arguments of 'hw' (see the
     'hcode' request below).  In addition, this request can be used more
     than once.

     Hyphenation points specified with 'hw' are not subject to the
     restrictions given by the 'hy' request (see below).

     Hyphenation exceptions specified with the 'hw' request are
     associated with the hyphenation language (see below) and
     environment (*note Environments::); calling the 'hw' request in the
     absence of a hyphenation language is an error.

     The request is ignored if there are no parameters.

   These are known as hyphenation _exceptions_ in the expectation that
most users will avail themselves of automatic hyphenation; these
exceptions override any rules that would normally apply to a word
matching a hyphenation exception defined with 'hw'.

   Situations also arise when only a specific occurrence of a word needs
its hyphenation altered or suppressed, or when something that is not a
word in a natural language, like a URL, needs to be broken in sensible
places without hyphens.

 -- Escape: \%
 -- Escape: \:
     To tell GNU 'troff' how to hyphenate words as they occur in input,
     use the '\%' escape, also known as the "hyphenation character".
     Preceding a word with this escape prevents it from being
     automatically hyphenated; each instance within a word indicates to
     GNU 'troff' that the word may be hyphenated at that point.  This
     mechanism affects only that occurrence of the word; to change the
     hyphenation of a word for the remainder of the document, use the
     'hw' request.

     GNU 'troff' regards the escapes '\X' and '\Y' as starting a word;
     that is, the '\%' escape in, say, '\X'...'\%foobar' or
     '\Y'...'\%foobar' no longer prevents hyphenation of 'foobar' but
     inserts a hyphenation point just prior to it; most likely this
     isn't what you want.  *Note Postprocessor Access::.

     The '\:' escape inserts a non-printing break point; that is, the
     word can break there, but the soft hyphen glyph is not written to
     the output if it does.  Breaks are word boundaries, so if a break
     is inserted, the remainder of the (input) word is subject to
     hyphenation as normal.

     You can use '\:' and '\%' in combination to control breaking of a
     file name or URL.

          ... check \%/var/log/\:\%httpd/\:\%access_log ...
[...]

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]