groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Formatting algorithm


From: Doug McIlroy
Subject: Re: [Groff] Formatting algorithm
Date: Fri, 25 Apr 2014 11:54:59 -0400
User-agent: Heirloom mailx 12.5 7/5/10

Equations and displayed quotations are not a problem for the line-breaking
algorithm; they'd all be handled in the macro packages, which would have
the duty to delimit the blocks of text that are to be treated unitarily.

But that doesn't mean we're home free. Line-length changes within such
blocks are still a problem. Consider the folliwng fragment for flowing
text around a predetermined 2"x3" hole for a figure. The line-breaking
algorithm has to be sensitive not only to line-length changes, but
to traps.
        .wh 4i indent
        .de indent
        'in +2i
        'll -2i
        .wh +3i undent
        ..
        .de undent
        'in -2i
        'll +2i
        ..
A similar case occurs when setting "figured" text. For example, fitting
text into a triangle involves trapping to change line length on every
line.

These examples are rather worse than what the KP paper deals with, where
the line lengths are known in advance rather than being calculated on
the fly.

Notionally there could be a phantom copy of groff running for each live
partial solution that the dynamic program maintains. Each phantom copy
would have its own .ev-like environment. Besides changes in line length,
it needs to keep up with page numbers so that .tm-based indexing works
right. Footnotes exacerbate the challenge of tracking page numbers.

Notice that among all the phantom groffs, the side effects of only one
should persist. These would routinely include the output requests used
for gathering index info. But they would also include number registers,
diversions, macro definitions, etc, assigned on the fly for other
purposes.

A straightforward way to pull this off would be to actualize the notional
copies of groff by forking. There would be one copy going forward from
each line break. That would evaluate the cost of breaking at each word
(or hyphenation point) on that line. At each line break the copies would
rendezvous to see which process should be cloned to continue. Output
of each process, both to standard output and standard error, would
be treasured up and only the ultimate winner's output would finally
be released.

This model is somewhat formidable. For long paragraphs it could rendezvous
and spawn a process at most every word, though typically only a dozen or
so processes would be alive at any one time.  So far my imagination has
failed to find a clean and efficient alternative that retains classic
groff semantics in every way except line-breaking. Who has a better idea?

Doug



reply via email to

[Prev in Thread] Current Thread [Next in Thread]