bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode

From:	Andrew Kensler
Subject:	bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode
Date:	Mon, 13 Feb 2023 00:27:10 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

A revised patch is attached.  Replies follow inline.

On 1/23/23 7:34 AM, Eli Zaretskii wrote:

Thanks, I think this is a welcome feature.  Please see a few comments
below.  I will send you the form for copyright assignment off-list.

Signed and sent.

I find this subsection too detailed and lengthy.  I'm not sure we want
to describe all the customizable options that users can customize,
just the main ones.  Assuming the default values are chosen well,
chances are that most users will not need to customize too many of
them, and therefore the manual doesn't have to describe them; we just
need to mention that several more options are available, and perhaps
have a special group for them for easier discoverability.  WDYT?


I have significantly trimmed this subsection of the manual.

I agree about the special group. The options related to balanced fillmode have now been collected in their own 'balanced-fill customizationgroup, which is a child of the main 'fill group. (Note that the otherexisting options in fill.el are now explicitly placed in the 'fill grouprather than relying on implicit ordering relative to the last groupdefinition.) This subsection of the manual now tells how to customizethe 'balanced-fill group, rather than listing all of the optionsindividually.

I have also renamed several of the options to better imply theirrelation; the ones that directly contribute (or are an exponent on avalue that contributes) to the total score being optimized now all endin -penalty, for example.

My other concern about the documentation is that it seems to describe
the feature in a way that is too technical and uses terminology from
the field of optimizations.  I'm afraid that users without background
in optimizations will have difficulty understanding some of the
options.  Can you try describing this in a less formal manner, so as
to make it easier to understand?

I have tried to move the focus of the documentation for this featureaway from the technical details of the implementation and more towards adescription of the effects.

This example is too long.  I suggest to find a shorter one.  I think
an example with 3 lines should be enough to explain the feature.

I have changed the manual subsection to use a much shorter example. Itis a bit longer than 3 lines, but that is because it uses a fill-widthof 20. However, this gave me room to put the examples with and withoutthis feature side-by-side.

This should mention at least a few main options that control the
feature.

In the NEWS announcement, I have added a reference to the newcustomization group.

Likewise here: this doc string is too abstract and thus hard to grok.
Talking about minimization of a cost function that penalizes something
only helps if one has some background in that domain.  I'd instead try
to say something like "fills the entire paragraph avoiding too short
lines" or something similar.

I have updated the doc string on the minor mode here, as well as theother doc strings to remove the mentions of minimizing a cost function. In this case, I went with "When enabled, filling will try to optimallychoose a set of line breaks to make a paragraph look tidier byconsidering the entire paragraph at a time. This may place line breakssooner than necessary if it improves later lines."

The first line of a doc string should be a single complete sentence
(here and elsewhere in the patch).  This is important because the
various "apropos" commands show only the first line of the doc string.

Good point! This might be worth a reminder under the "Documenting yourchanges" section of the CONTRIBUTE guide where it mentions doc-strings.

These options are probably related, in that changing one needs a
suitable change in others to make sense.  If this is indeed so, please
mention the relations in the doc strings.  You say "Additional", but
that is meaningless when each doc string is read separately
(additional to what?)

That's a fair point. Each of the options now that contributes to thescoring now ends in -penalty and is a member of the 'balanced-fillcustomization group. I have also updated the doc strings to mentionthat these values are about the prioritization of some aspect of theappearance of the paragraph relative to the other penalties. Hopefullythat is more clear now. I have removed the "Additional" language.

If this is an internal function, please use our convention of naming
it with double dash, as in balanced-fill--break-lines.  If this is
supposed to be a public function, it should have a doc string.

I have renamed it to use the double dash. I had originally been tryingto match it to the existing code in fill.el, which doesn't seem use thatconvention. But I also realize that that code may predate thatconvention and you wouldn't have wanted to rename it and risk breakingany user code relying on it. No need for new code to be bound by that,though.

Last, but not least: what about performance?  Is this performant
enough to apply to large enough paragraphs, including via
auto-fill-mode?  Can you provide some measurements?

That's a fair question. That is why I had added thebalanced-fill-word-limit option to restrict this to small to mediumparagraphs. But to test the performance, I have written a smallbenchmark (see attached) that creates temporary buffers withsuccessively larger copies of the common "Lorem ipsum" paragraph, all onon line, and measures the time to fill it. Here are timings that I gotfrom it on my 10-year old machine at a few key sizes, with and withoutnativecomp:


Without nativecomp:
  For 69 words:
    greedy fill   -- 0.000216s
    balanced fill -- 0.001203s
  For 483 words:
    greedy fill   -- 0.001201s
    balanced fill -- 0.023777s
  For 2001 words:
    greedy fill   -- 0.005144s
    balanced fill -- 0.341215s
With nativecomp:
  For 69 words:
    greedy fill   -- 0.000190s
    balanced fill -- 0.000991s
  For 483 words:
    greedy fill   -- 0.001086s
    balanced fill -- 0.021680s
  For 2001 words:
    greedy fill   -- 0.004645s
    balanced fill -- 0.316352s

Note that for this benchmark, I had raised the balanced-fill-word-limitoption so that it would test the new balanced fill algorithm, even atlarger sizes. With its current default of the 500 word limit, thetimings for the 2001 words tests when balanced fill mode was enabledwere around ~0.5ms since it would fall back to the greedy fillalgorithm. Thus the maximum time with this minor mode enabled and thedefault word limit would be about ~2.5ms on this machine for a 500 wordparagraph.

Hopefully this new revision addresses your major critiques from before. I'm happy to refine it further if you still have reservations or spotanything else.

0001-Add-new-minor-balanced-fill-mode-to-filling.patch
Description: Text Data

benchmark-fill.el
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode, Andrew Kensler <=

Prev by Date: bug#61460: 30.0.50; Calendar shows eclipse for quarter moon
Next by Date: bug#61457: 30.0.50; compiler falsely complains missing ‘describe-char-display’
Previous by thread: bug#61466: compilation--parse-region: Stack overflow in regexp matcher...
Next by thread: bug#61472: 29.0.60; css{-ts}-mode highlight problem
Index(es):
- Date
- Thread