groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mission statement and Knuth-Plass reconsidered


From: Bertrand Garrigues
Subject: Re: Mission statement and Knuth-Plass reconsidered
Date: Sat, 08 Jul 2023 13:41:44 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Hi Branden,

On mar., mai 23 2023 at 02:15:33 , "G. Branden Robinson" 
<g.branden.robinson@gmail.com> wrote:
>> > Myself, I wonder if K-P couldn't be implemented above the formatter
>> > itself, using a diversion.  We could then put the implementation in
>> > an auxiliary macro package.
[...]
> I'll note that even if K-P can't be done this way, because the algorithm
> requires a view of more than one page at a time

Indeed, I don't think it's possible to implement Knuth-Plass in a macro
package, it's already very difficult to implement it into tree, but in a
macro package it seems impossible.

> (I, uh, admit I haven't actually learned how Knuth-Plass works yet)

I've previously worked on Knuth-Plass [1], the old 'format_knuth_plass'
branch no longer compiled so I've deleted it and pushed a fresh new
'format_knuth_plass2' fully sync against master.

The best resource is of course Donald E. Knuth's "Digital Topography"
book, chapter "Breaking Paragraph Into Lines".  This link [2] is also
very helpful with a visual description of how the algorithm works (In my
previous post I've said that there was a mistake in the demerit
calculation though, but I no longer remember what the problem exactly is).

Going back to my 'format_knuth_plass2': it contains a paragraph.cpp file
that fully implement the Knuth-Plass algorithm, but it is not yet
connected to the troff program.

Nevertheless the paragraph.cpp is fully tested by the 'test_paragraph'
test program.  You have to explicitly build it (it is not built by
default):

  make test_paragraph

It contains a unit test suite (using the CppUnit framework), if you
launch it without option it will run all the unit tests.  If you pass
options '-s 1 -t 1' (suite 1 test 1) it will format the paragraph of the
original Knuth example:

   -- Suite 1 Initialisation
   -- Test11...
      Checking the number of lines
      Checking the lines adjustement ratio
      Checking all breakpoints demerits
      Checking the best breakpoints array
   Number of lines: 10                                                  | adj. 
ratio | total demerits | fitness class |
   
      In olden times when wishing still helped one, there lived a              
0.774             2209               0  
                                                                ^
   king whose daughters were all beautiful; and the youngest was               
0.179             2213               0  
      ^                                                        ^
   so beautiful that the sun itself, which has seen so much, was               
0.629             2889               0  
    ^                                                          ^
   astonished whenever it shone in her face. Close by the king's               
0.545             3178               0  
       ^                                                       ^
   castle lay a great dark forest, and under an old lime-tree in the           
0.000             3179               0  
        ^   ^                                                  ^   ^
   forest was a well, and when the day was very warm, the king's               
0.079             3180               0  
     ^  ^   ^ ^                                         ^      ^
   child went out into the forest and sat down by the side of the              
0.282             3189               0  
       ^    ^   ^                                        ^  ^   ^
   cool fountain; and when she was bored she took a golden ball,               
0.294             3205               0  
      ^    ^    ^   ^                                    ^     ^
   and threw it up on high and caught it; and this ball was her                
0.575             3605               0  
     ^     ^  ^  ^  ^                                     ^   ^
   favorite plaything.                                                         
0.000             3606               0  
       ^  ^    ^     ^ 
   -- Test test11_original_example PASSED


It shows, for each line, the candidate breakpoints (with a `^' sign) and
the calculation of "adjustment ratio" and "total demerits".  The test
checks the obtained result against the values given in Knuth's book.
Note that Knuth later corrected his "demerit" formula, so this examples
uses the old formula but not the other examples.

You can also give to the 'test_paragraph' program any text in a file,
and he will try to format the text in a paragraph.  You can adjust the
"tolerance" and the line width.  For example if you use the attached
'lorem.txt' and do:

  test_paragraph -f lorem.txt

it will first fail:

Processing content of ../../lorem.txt with tolerance:1.000 line length:500
[paragraph.cpp:1072:format_knuth_plass]Could not format paragraph

But you can play on the "tolerance" or the line length with -T and -l,
e.g. pass a tolerance of 2.1:

   ./test_paragraph -f lorem.txt -T 2.1
   Processing content of lorem.txt with tolerance:2.100 line length:500
   
   Number of lines: 14                                                   | adj. 
ratio | total demerits | fitness class |
   
      Lorem ipsum dolor sit amet, consectetur adipiscing elit.                  
2.000           641601               3  
                                                             ^
   Sed non risus. Suspendisse lectus tortor, dignissim sit amet,                
1.000           651802               2  
     ^                                                         ^
   adipiscing nec, ultricies sed, dolor. Cras elementum ultrices                
1.192           680702               3  
                                                               ^
   diam. Maecenas ligula massa, varius a, semper congue, euismod                
0.100           690703               1  
                                                               ^
   non, mi. Proin porttitor, orci nec nonummy molestie, enim est                
0.242           690707               1  
                                                           ^   ^
   eleifend mi, non fermentum diam nisl sit amet erat. Duis semper.            
-0.737           692388               0  
                                                          ^       ^
   Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim.          
-0.833           695869               0  
                                                   ^       ^  ^     ^
   Pellentesque congue. Ut in risus volutpat libero pharetra tempor.           
-0.733           697469               0  
                                           ^      ^        ^        
   Cras vestibulum bibendum augue. Praesent egestas leo in pede.                
0.000           697470               1  
   
   Praesent blandit odio eu enim. Pellentesque sed dui ut augue                 
0.633           698146               2  
   
   blandit sodales. Vestibulum ante ipsum primis in faucibus orci               
0.296           698162               1  
   
   luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac           
-0.438           698243               1  
   
   mauris sed pede pellentesque fermentum. Maecenas adipiscing                  
0.667           699204               2  
   
   
or keep the tolerance to 1.0 and use a line of 850:

   ./test_paragraph -f lorem.txt -l 850
   Processing content of lorem.txt with tolerance:1.000 line length:850
   
   Number of lines: 8                                                           
                                      | adj. ratio | total demerits | fitness 
class |
   
      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. 
Suspendisse lectus tortor, dignissim          -0.423            10081           
    1  
                                                                                
                                 ^
   sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices 
diam. Maecenas ligula massa, varius               0.365            10117        
       1  
                                                                                
                             ^
   a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy 
molestie, enim est eleifend mi, non                  0.175            10121     
          1  
    ^                                                                           
                      ^   ^
   fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque 
vitae, consequat in, pretium a,             -0.200            10125             
  1  
   
   enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras 
vestibulum bibendum augue.                   0.229            10129             
  1  
   
   Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque 
sed dui ut augue blandit sodales.               0.185            10133          
     1  
   
   Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere 
cubilia Curae; Aliquam nibh. Mauris             -0.071            10134         
      1  
   
   ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam 
sodales hendrerit.                           0.000            10135             
  1  


Of course the shorter the line, the harder it is for the algorithm to
find a solution with the given tolerance.

Note that 'test_pragraph' supports only ascii, the width of the various
letter are hard-coded with values from Knuth'book, and there is no
hyphenation support (only for Knuth's original example I simulate that
we may have some words hyphenated).

The main difficulties to connect this work to troff are that the three
big files of troff (input.cpp, env.cpp, node.cpp) are too closely tied
and that the whole line-oriented logic is difficult to adapt
(particularly the diversion is hard to manage).  Some preliminary
refactoring would probably be needed.  If people are interested I may
try to find some time to work on it.


[1] https://lists.gnu.org/archive/html/groff/2017-11/msg00079.html

[2] https://defoe.sourceforge.net/folio/knuth-plass.html

Regards,

Bertrand

Attachment: lorem.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]