groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU troff's fundamental character type (was: neatroff for Russian)


From: G. Branden Robinson
Subject: GNU troff's fundamental character type (was: neatroff for Russian)
Date: Sat, 6 May 2023 22:06:06 -0500

At 2023-04-29T22:33:52-0500, Dave Kemper wrote:
> On 4/26/23, G. Branden Robinson <g.branden.robinson@gmail.com> wrote:
> > It would probably be a good idea to represent Unicode strings
> > internally using char32_t as a base type anyway, but groff's design
> > under the Unix filter model described above makes the choice less
> > dramatic in terms of increased space consumption than it would
> > otherwise be.
> 
> But to keep scalability in mind, this design shouldn't be assumed to
> be immutable.  Implementing the Knuth-Plass (or some other)
> paragraph-at-once algorithm would greatly expand the amount of input
> groff has to remember at once,

Only by about an order of magnitude.  Which sounds like a lot until we
consider how many of those we've gained in memory and persistent storage
bandwidth and CPU instruction retirement rate since the PDP-11.

> and a theoretical future chapter-at-once algorithm (to, for example,
> optimize page layouts to eliminate widows) vastly expands it beyond
> that.

Well, if you format each paragraph in a diversion, you don't need to
expand the formatter's view of the present as much.

> It's possible memory is too cheap to worry about even the worst case,
> where groff 4.38 has to hold an entire document in memory (maybe to
> finally allow it to put the table of contents up front without page
> reordering),

Just in case people fear you're _not_ being facetious, there are better
solutions for this.  One already exists in PDF, and I have proposed a
general solution for all documents.

https://savannah.gnu.org/bugs/?61836

> but it's a question worth considering before making changes to groff's
> fundamental data type.

I disagree with this too.  Part of the value of encapsulation of the
fundamental character type inside a formatter-specific type is that we
can change our minds _again_ if circumstances warrant.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]