unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode
@ 2023-01-23  9:40 Andrew Kensler
  2023-01-23 15:34 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Kensler @ 2023-01-23  9:40 UTC (permalink / raw)
  To: 61028


[-- Attachment #1.1: Type: text/plain, Size: 2116 bytes --]

Greetings all,

The proposed patch attached adds a new minor balanced-fill-mode with an 
alternate line breaking algorithm for paragraph filling.  When enabled, 
it will try to neatly balance line lengths to reduce raggedness, avoid 
widows, avoid starting a new sentence on the last word of a line, avoid 
ending a sentence on the first word of a line, and so forth.  It is 
heavily inspired by the Knuth-Plass line breaking algorithm and uses 
dynamic programming to try to choose line breaks to minimize a cost 
function.

For example, consider the following mock paragraph as filled by the 
current greedy algorithm with the fill-column set to 15:

Ccc ccc a bb
dddd bb bb a
ccc a
jjjjjjjjjj a
eeeee a
hhhhhhhh bb
dddd.

With the new minor mode enabled, it would instead be filled much more 
nicely as:

Ccc ccc a
bb dddd bb
bb a ccc a
jjjjjjjjjj
a eeeee a
hhhhhhhh
bb dddd.

Often, the result is similar to simply having used a particular slightly 
narrower fill-column with the current greedy algorithm.  The advantage, 
however, is that it figures out the correct column automatically and 
per-paragraph.

The main piece of implementation is in the new 
(balanced-fill-break-lines) function in fill.el, with a modification to 
(fill-region-as-paragraph) to optionally call this before its current 
line breaking loop.  If it runs successfully, then point will be set to 
the end of the paragraph and that loop skipped.

Note that (fill-region-as-paragraph) has no hooks and is too monolithic 
to advise for this kind of thing, hence my hoping to upstream this change.

This is my first time contributing anything significant to Emacs or 
writing here.  I believe that the proposed patch covers all the major 
needs: the code itself, commit message, info documentation, announcement 
in NEWS, and a basic ERT test.  If there's anything I've missed or 
suggestions for improvements, please let me know. (And I hope this is 
the correct mailing list and message format, too.)  I'll be happy to 
sign the copyright assignment paperwork if this looks like something 
you'd like to accept.

Cheers,
- Andrew

[-- Attachment #1.2: Type: text/html, Size: 2759 bytes --]

[-- Attachment #2: 0001-Add-new-minor-balanced-fill-mode-to-filling.patch --]
[-- Type: text/x-patch, Size: 18928 bytes --]

From 20f837a1197bdf3cc11896f214b0f30327b6837c Mon Sep 17 00:00:00 2001
From: Andrew Kensler <andrew@eastfarthing.com>
Date: Fri, 20 Jan 2023 18:17:41 -0800
Subject: [PATCH] Add new minor `balanced-fill-mode` to filling

When enabled, filling will consider the entire paragraph at a
time and try to place line breaks optimally to look more neat
and even, according to a cost function.  This is inspired by the
Knuth-Plass algorithm.

* lisp/textmodes/fill.el
(balanced-fill-mode)
(balanced-fill-maximum-words)
(balanced-fill-margin)
(balanced-fill-widows)
(balanced-fill-length-exponent)
(balanced-fill-raggedness-penalty)
(balanced-fill-single-penalty)
(balanced-fill-break-penalty): New variables.
(balanced-fill-break-lines): New line breaking function.
(fill-region-as-paragraph): Use it to fill paragraphs.

* test/lisp/textmodes/fill-tests.el: Add smoke test for it.

* doc/emacs/emacs.texi (Top):
* doc/emacs/text.texi (Filling Text): Document it.

* etc/NEWS: Announce it.
---
 doc/emacs/emacs.texi              |   1 +
 doc/emacs/text.texi               |  91 +++++++++++++
 etc/NEWS                          |   6 +
 lisp/textmodes/fill.el            | 214 ++++++++++++++++++++++++++++++
 test/lisp/textmodes/fill-tests.el |  15 +++
 5 files changed, 327 insertions(+)

diff --git a/doc/emacs/emacs.texi b/doc/emacs/emacs.texi
index b6d149eb3e..babd487fbd 100644
--- a/doc/emacs/emacs.texi
+++ b/doc/emacs/emacs.texi
@@ -616,6 +616,7 @@ Top
 * Fill Prefix::         Filling paragraphs that are indented
                           or in a comment, etc.
 * Adaptive Fill::       How Emacs can determine the fill prefix automatically.
+* Balanced Fill::       Breaking lines to look more even and neat.
 
 Outline Mode
 
diff --git a/doc/emacs/text.texi b/doc/emacs/text.texi
index 6e16e743a5..251452df17 100644
--- a/doc/emacs/text.texi
+++ b/doc/emacs/text.texi
@@ -497,6 +497,7 @@ Filling
 * Fill Commands::  Commands to refill paragraphs and center lines.
 * Fill Prefix::    Filling paragraphs that are indented or in a comment, etc.
 * Adaptive Fill::  How Emacs can determine the fill prefix automatically.
+* Balanced Fill::  Breaking lines to look more even and neat.
 @end menu
 
 @node Auto Fill
@@ -828,6 +829,96 @@ Adaptive Fill
 line.  If it returns @code{nil}, @code{adaptive-fill-regexp} gets
 a chance to find a prefix.
 
+@node Balanced Fill
+@subsection Balanced Filling
+
+@cindex balanced filling
+  Filling can consider an entire paragraph at a time when determining
+where to place line breaks, as an alternative to just greedily
+inserting a break after the last word that fits on each line.
+
+  In this mode, it will try to optimally choose the set of line
+breaks to minimize a cost function that penalizes untidy paragraphs.
+This may place line breaks sooner than necessary if it improves
+later lines.
+
+  For example, if @code{fill-column} is 60 and balanced filling is
+disabled then the greedy algorithm will fill the following paragraph
+like so:
+
+@smallexample
+"It's not too far-fetched to say that the best programs are
+the ones written when the programmer is supposed to be
+working on something else...  Very good things happen when
+management is enlightened enough to appreciate the
+importance of allowing programmers some free time for
+projects of this sort."  --Melinda Varian
+@end smallexample
+
+@noindent
+while enabling the balanced filling will instead produce the following
+with slightly shorter but more even line lengths:
+
+@smallexample
+"It's not too far-fetched to say that the best programs
+are the ones written when the programmer is supposed to
+be working on something else...  Very good things happen
+when management is enlightened enough to appreciate the
+importance of allowing programmers some free time for
+projects of this sort."  -- Melinda Varian
+@end smallexample
+
+@findex balanced-fill-mode
+@vindex balanced-fill-mode
+  Balanced Fill mode is disabled by default.  To toggle it globally,
+type @kbd{M-x balanced-fill-mode}.
+
+@vindex balanced-fill-maximum-words
+  For speed, the maximum limit on the number of words that the
+balanced fill algorithm will attempt to process in a single paragraph
+is controlled by @code{balanced-fill-maximum-words}.  If the number
+of words in the paragraph exceeds this limit, then filling will fall
+back to the faster greedy algorithm.
+
+  The cost function used to determine where to place the line breaks
+can be tuned through several variables.
+
+@vindex balanced-fill-margin
+  Set @code{balanced-fill-margin} to the number of columns before
+the fill-column that the balanced fill algorithm should attempt
+to break lines at.  Attempting to break lines slightly short
+of the @code{fill-column} but being allowed to go up to the
+@code{fill-column} can help to make the lines more even in length.
+
+@vindex balanced-fill-widows
+  Set @code{balanced-fill-widows} to the minimum number of words that
+the algorithm should attempt to leave on the last line of a paragraph.
+If this is set to an extremely high number, then the balanced filling
+will generally try to make the last line as full as all the others.
+
+@vindex balanced-fill-length-exponent
+  The @code{balanced-fill-length-exponent} is the main control on the
+cost function.  It affects the penalty for lines that are shorter or
+longer than the target length to the margin.  The difference between
+the two lengths will be raised to this power when calculating the
+cost of a potential line break.
+
+@vindex balanced-fill-raggedness-penalty
+@vindex balanced-fill-single-penalty
+@vindex balanced-fill-break-penalty
+  There are also several minor additive penalties to help improve
+the appearance.  The @code{balanced-fill-raggedness-penalty} applies
+for each column of difference in length for a line relative to the
+previous line, unless this is the last line and longer than second
+to last.  Higher numbers make it try harder to keep all lines as
+even as possible in length at the expense of other factors.  The
+@code{balanced-fill-single-penalty} is added either for starting a
+new sentence with a single word right at the end of a line, or else
+for ending a sentence with a single word left at the start of a line.
+Finally, there is a @code{balanced-fill-break-penalty} for each line
+break added.  The larger this is, the more the algorithm will try to
+minimize the number of lines despite the other penalties.
+
 @node Case
 @section Case Conversion Commands
 @cindex case conversion
diff --git a/etc/NEWS b/etc/NEWS
index 10e91ec4ab..1379b2d90a 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -53,6 +53,12 @@ trash when deleting.  Default is nil.
 \f
 * Editing Changes in Emacs 30.1
 
+** Filling can now try to break lines evenly.
+The new user option 'balanced-fill-mode' can be set to non-nil to
+make filling consider the entire paragraph at a time and try to
+place line breaks optimally to look more neat and even, according
+to a cost function.  This is inspired by the Knuth-Plass algorithm.
+
 ---
 ** New command 'kill-matching-buffers-no-ask'.
 This works like 'kill-matching-buffers', but without asking for
diff --git a/lisp/textmodes/fill.el b/lisp/textmodes/fill.el
index 2fde2ff6c4..c87b0afd3a 100644
--- a/lisp/textmodes/fill.el
+++ b/lisp/textmodes/fill.el
@@ -132,6 +132,86 @@ adaptive-fill-function
   :version "27.1"
   :type 'function)
 
+(define-minor-mode balanced-fill-mode
+  "Toggle whether filling should try to neatly balance line lengths.
+
+When enabled, filling will consider the entire paragraph and
+try to optimally choose a set of line breaks to minimize a
+cost function that penalizes untidy paragraphs.  This may
+place line breaks sooner than necessary if it improves later
+lines.  When disabled, filling uses the traditional greedy line
+breaking algorithm.
+
+See Info node `(emacs) Balanced Fill' for more details."
+  :global t
+  :version "30.1"
+  :type 'boolean
+  :group 'fill)
+
+(defcustom balanced-fill-maximum-words 500
+  "Maximum limit on the number of words that the balanced fill
+algorithm will attempt to process in a single paragraph.  If
+the paragraph exceeds this limit, filling will fall back to
+the standard greedy algorithm."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-margin 3
+  "Number of columns before the fill-column that the balanced fill
+algorithm will attempt to break lines at.  Attempting to break
+lines slightly short of the fill-column but being allowed to
+go up to the fill-column can help to make the lines more even
+in length."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-widows 2
+  "Minimum number of words that the algorithm will attempt to leave
+on the last line of a paragraph.  If this is set to an extremely
+high number, then the balanced filling will generally try to make
+the last line as full as all the others."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-length-exponent 3
+  "Controls the penalty for lines that are shorter or longer than
+the target length to the margin.  The difference between the
+actual and target length will be raised to this power when
+calculating the cost of a potential line break.  This is the
+main control for the cost function."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-raggedness-penalty 40
+  "Additional penalty for each column of difference in length
+relative to the previous line, unless this is the last line
+and longer than second to last.  Higher numbers make it try
+harder to keep all lines as even as possible in length at the
+expense of other factors."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-single-penalty 150
+  "Additional penalty either for starting a new sentence with a
+single word right at the end of a line, or else for ending a
+sentence with a single word left at the start of a line."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
+(defcustom balanced-fill-break-penalty 50
+  "Additional penalty for each line break added.  The larger this
+is, the more the algorithm will try to minimize the number of
+lines despite the other penalties."
+  :version "30.1"
+  :type 'integer
+  :group 'fill)
+
 (defvar fill-indent-according-to-mode nil ;Screws up CC-mode's filling tricks.
   "Whether or not filling should try to use the major mode's indentation.")
 
@@ -644,6 +724,135 @@ fill-indent-to-left-margin
     (indent-line-to (current-left-margin))
     (put-text-property beg (point) 'face 'default)))
 
+(defun balanced-fill-break-lines (from to justify)
+  ;; Build a table of visible word widths, with and without any preceding
+  ;; spaces, along with whether the word starts a new sentence.  We go by
+  ;; columns and not chars to handle invisible text (especially invisible
+  ;; spaces), etc.
+  (let ((words '())
+        (count 0)
+        (sentence-regexp (sentence-end)))
+    (goto-char to)
+    (while (> (point) from)
+      (let* ((previous (point))
+             (end (current-column))
+             (next (progn (fill-move-to-break-point from)
+                          ;; Needed for moving past some Org-mode links
+                          (skip-chars-backward " \t")
+                          (point)))
+             (with-space (current-column))
+             (without-space (progn (skip-chars-forward " \t")
+                                   (current-column)))
+             (new-sentence (looking-back sentence-regexp from)))
+        (goto-char next)
+        ;; If point didn't move, we're after the first word of the line.
+        (when (<= previous next)
+          (goto-char from)
+          (setq with-space (current-column)
+                without-space (current-column)))
+        ;; Merge into previous entry when moving past invisible text.
+        (if (= end without-space)
+            (setq end (+ end (or (car (pop words)) 0))))
+        (push (list (- end with-space) (- end without-space) new-sentence)
+              words)))
+    (setq words (vconcat words))
+    (setq count (length words))
+
+    ;; Abort (and fall back to greedy algorithm) if we have too many words.
+    ;; The algorithm below is worst-case quadratic (though usually not).
+    ;; Asymptotically faster algorithms exist, but are more complicated.
+    (if (> count balanced-fill-maximum-words)
+        nil
+
+      ;; Consider each word as a candidate to start a line, and build a
+      ;; table of which word the previous line would be best to start on
+      ;; and the width of the previous line.  Use dynamic programming to
+      ;; minimize a cost function.
+      (let ((starts (make-vector (1+ count) 0))
+            (widths (make-vector (1+ count) 0)))
+        (let* ((start 0)
+               (room (- fill-column (current-column)))
+               (costs (make-vector (1+ count) most-positive-fixnum))
+               (rags (make-vector (1+ count) balanced-fill-margin)))
+          (aset costs 0 0)
+          (while (< start count)
+            ;; The room for the first line may be different than the rest.
+            (if (= start 1)
+                (setq room (- fill-column
+                              (string-width (or fill-prefix "")))))
+            ;; Consider each word this new line might end on.  Don't test
+            ;; width against room yet; we want at least one word per line,
+            ;; so we need at least one iteration.
+            (let ((end start)
+                  (width 0))
+              (while (< end count)
+                ;; Don't add the space before the first word to the width.
+                (setq width (+ width (nth (if (= end start) 1 0)
+                                          (aref words end))))
+                ;; Our cost function is the sum of the best total cost
+                ;; for all the lines preceding the start of this one,
+                (let ((cost (+ (aref costs start)
+                               ;; plus the distance from the margin
+                               (expt (abs (- room balanced-fill-margin width))
+                                     ;; exponentiated if not the last line
+                                     (if (or (/= (1+ end) count)
+                                             ;; or if we'd leave a widow,
+                                             (< (- (1+ end) start)
+                                                balanced-fill-widows))
+                                         balanced-fill-length-exponent 1))
+                               ;; plus a penalty for an uneven right side
+                               (* (abs (- room width
+                                          (aref rags start)))
+                                  ;; if not the last line
+                                  (if (or (/= (1+ end) count)
+                                          ;; unless the last line is longer,
+                                          (< (- room width)
+                                             (aref rags start)))
+                                      balanced-fill-raggedness-penalty 0))
+                               ;; plus a penalty if ending on the
+                               ;; start of a new sentence,
+                               (if (nth 2 (aref words end))
+                                   balanced-fill-single-penalty 0)
+                               ;; and a penalty if starting with
+                               ;; a single word and then a new sentence,
+                               (if (and (< (1+ start) count)
+                                        (nth 2 (aref words (1+ start))))
+                                   balanced-fill-single-penalty 0)
+                               ;; and a penalty for each break.
+                               balanced-fill-break-penalty)))
+                  ;; For ending after here, is this a better starting place?
+                  (when (and (<= cost (aref costs (1+ end)))
+                             (or (<= width room) (= end start)))
+                    (aset costs (1+ end) cost)
+                    (aset starts (1+ end) start)
+                    (aset widths (1+ end) width)
+                    (aset rags (1+ end) (- room width))))
+                ;; Break the inner (end) loop if we're out of room now.
+                (if (>= width room)
+                    (setq end count))
+                (setq end (1+ end))))
+            (setq start (1+ start))))
+
+        ;; Walk backwards from the end of the table to reconstruct a list
+        ;; of the optimal widths for each line.
+        (let (chosen)
+          (let ((end count))
+            (while (> end 0)
+              (if (< end count)
+                  (push (aref widths end) chosen))
+              (setq end (aref starts end))))
+
+          ;; Then insert those line breaks and justify each line.
+          (mapc (lambda (width)
+                  (move-to-column (+ (current-column) width))
+                  (fill-newline)
+		  (if justify
+		      (save-excursion
+		        (forward-line -1)
+		        (justify-current-line justify nil t))))
+                chosen)
+          (if justify (justify-current-line justify t t)))))))
+
 (defun fill-region-as-paragraph (from to &optional justify
 				      nosqueeze squeeze-after)
   "Fill the region as if it were a single paragraph.
@@ -762,6 +971,11 @@ fill-region-as-paragraph
 
 	;; This is the actual filling loop.
 	(goto-char from)
+        ;; Attempt to break into balanced lines if desired.
+        (when balanced-fill-mode
+          (balanced-fill-break-lines from to justify))
+        ;; Otherwise (if point is still at from),
+        ;; fall back to the standard greedy line breaking loop.
 	(let (linebeg)
           (while (< (point) to)
 	    (setq linebeg (point))
diff --git a/test/lisp/textmodes/fill-tests.el b/test/lisp/textmodes/fill-tests.el
index ef822ba805..11db6a499f 100644
--- a/test/lisp/textmodes/fill-tests.el
+++ b/test/lisp/textmodes/fill-tests.el
@@ -121,6 +121,21 @@ test-fill-haskell
   ;; w
 ")))
 
+(ert-deftest fill-test-balanced-fill-mode nil
+  "Basic test of the `balanced-fill-mode' option."
+  (with-temp-buffer
+    (insert "Ccc ccc a bb dddd bb bb a ccc a jjjjjjjjjj a eeeee a hhhhhhhh bb dddd.")
+    (setq fill-column 15)
+    (setq-local balanced-fill-mode nil)
+    (fill-paragraph)
+    (should (string= (buffer-string) "Ccc ccc a bb\ndddd bb bb a\nccc a\njjjjjjjjjj a\neeeee a\nhhhhhhhh bb\ndddd.")))
+  (with-temp-buffer
+    (insert "Ccc ccc a bb dddd bb bb a ccc a jjjjjjjjjj a eeeee a hhhhhhhh bb dddd.")
+    (setq fill-column 15)
+    (setq-local balanced-fill-mode t)
+    (fill-paragraph)
+    (should (string= (buffer-string) "Ccc ccc a\nbb dddd bb\nbb a ccc a\njjjjjjjjjj\na eeeee a\nhhhhhhhh\nbb dddd."))))
+
 (provide 'fill-tests)
 
 ;;; fill-tests.el ends here
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode
  2023-01-23  9:40 bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode Andrew Kensler
@ 2023-01-23 15:34 ` Eli Zaretskii
  2023-02-13  8:27   ` Andrew Kensler
  0 siblings, 1 reply; 3+ messages in thread
From: Eli Zaretskii @ 2023-01-23 15:34 UTC (permalink / raw)
  To: Andrew Kensler; +Cc: 61028

> Date: Mon, 23 Jan 2023 01:40:40 -0800
> From: Andrew Kensler <andrew@eastfarthing.com>
> 
> The proposed patch attached adds a new minor balanced-fill-mode with an alternate line breaking algorithm
> for paragraph filling.  When enabled, it will try to neatly balance line lengths to reduce raggedness, avoid
> widows, avoid starting a new sentence on the last word of a line, avoid ending a sentence on the first word
> of a line, and so forth.  It is heavily inspired by the Knuth-Plass line breaking algorithm and uses dynamic
> programming to try to choose line breaks to minimize a cost function.
> 
> For example, consider the following mock paragraph as filled by the current greedy algorithm with the
> fill-column set to 15:
> 
> Ccc ccc a bb
> dddd bb bb a
> ccc a
> jjjjjjjjjj a
> eeeee a
> hhhhhhhh bb
> dddd.
> 
> With the new minor mode enabled, it would instead be filled much more nicely as:
> 
> Ccc ccc a
> bb dddd bb
> bb a ccc a
> jjjjjjjjjj
> a eeeee a
> hhhhhhhh
> bb dddd.
> 
> Often, the result is similar to simply having used a particular slightly narrower fill-column with the current
> greedy algorithm.  The advantage, however, is that it figures out the correct column automatically and
> per-paragraph.
> 
> The main piece of implementation is in the new (balanced-fill-break-lines) function in fill.el, with a modification
> to (fill-region-as-paragraph) to optionally call this before its current line breaking loop.  If it runs successfully,
> then point will be set to the end of the paragraph and that loop skipped.
> 
> Note that (fill-region-as-paragraph) has no hooks and is too monolithic to advise for this kind of thing, hence
> my hoping to upstream this change.
> 
> This is my first time contributing anything significant to Emacs or writing here.  I believe that the proposed
> patch covers all the major needs: the code itself, commit message, info documentation, announcement in
> NEWS, and a basic ERT test.  If there's anything I've missed or suggestions for improvements, please let
> me know.  (And I hope this is the correct mailing list and message format, too.)  I'll be happy to sign the
> copyright assignment paperwork if this looks like something you'd like to accept.

Thanks, I think this is a welcome feature.  Please see a few comments
below.  I will send you the form for copyright assignment off-list.

> +@node Balanced Fill
> +@subsection Balanced Filling

I find this subsection too detailed and lengthy.  I'm not sure we want
to describe all the customizable options that users can customize,
just the main ones.  Assuming the default values are chosen well,
chances are that most users will not need to customize too many of
them, and therefore the manual doesn't have to describe them; we just
need to mention that several more options are available, and perhaps
have a special group for them for easier discoverability.  WDYT?

My other concern about the documentation is that it seems to describe
the feature in a way that is too technical and uses terminology from
the field of optimizations.  I'm afraid that users without background
in optimizations will have difficulty understanding some of the
options.  Can you try describing this in a less formal manner, so as
to make it easier to understand?

> +  For example, if @code{fill-column} is 60 and balanced filling is
> +disabled then the greedy algorithm will fill the following paragraph
> +like so:
> +
> +@smallexample
> +"It's not too far-fetched to say that the best programs are
> +the ones written when the programmer is supposed to be
> +working on something else...  Very good things happen when
> +management is enlightened enough to appreciate the
> +importance of allowing programmers some free time for
> +projects of this sort."  --Melinda Varian
> +@end smallexample
> +
> +@noindent
> +while enabling the balanced filling will instead produce the following
> +with slightly shorter but more even line lengths:
> +
> +@smallexample
> +"It's not too far-fetched to say that the best programs
> +are the ones written when the programmer is supposed to
> +be working on something else...  Very good things happen
> +when management is enlightened enough to appreciate the
> +importance of allowing programmers some free time for
> +projects of this sort."  -- Melinda Varian
> +@end smallexample

This example is too long.  I suggest to find a shorter one.  I think
an example with 3 lines should be enough to explain the feature.

> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -53,6 +53,12 @@ trash when deleting.  Default is nil.
>  \f
>  * Editing Changes in Emacs 30.1
>  
> +** Filling can now try to break lines evenly.
> +The new user option 'balanced-fill-mode' can be set to non-nil to
> +make filling consider the entire paragraph at a time and try to
> +place line breaks optimally to look more neat and even, according
> +to a cost function.  This is inspired by the Knuth-Plass algorithm.

This should mention at least a few main options that control the
feature.

> +(define-minor-mode balanced-fill-mode
> +  "Toggle whether filling should try to neatly balance line lengths.
> +
> +When enabled, filling will consider the entire paragraph and
> +try to optimally choose a set of line breaks to minimize a
> +cost function that penalizes untidy paragraphs.  This may
> +place line breaks sooner than necessary if it improves later
> +lines.  When disabled, filling uses the traditional greedy line
> +breaking algorithm.

Likewise here: this doc string is too abstract and thus hard to grok.
Talking about minimization of a cost function that penalizes something
only helps if one has some background in that domain.  I'd instead try
to say something like "fills the entire paragraph avoiding too short
lines" or something similar.

> +(defcustom balanced-fill-maximum-words 500
> +  "Maximum limit on the number of words that the balanced fill
> +algorithm will attempt to process in a single paragraph.  If

The first line of a doc string should be a single complete sentence
(here and elsewhere in the patch).  This is important because the
various "apropos" commands show only the first line of the doc string.

> +(defcustom balanced-fill-length-exponent 3
> +  "Controls the penalty for lines that are shorter or longer than
> +the target length to the margin.  The difference between the
> +actual and target length will be raised to this power when
> +calculating the cost of a potential line break.  This is the
> +main control for the cost function."
> +  :version "30.1"
> +  :type 'integer
> +  :group 'fill)
> +
> +(defcustom balanced-fill-raggedness-penalty 40
> +  "Additional penalty for each column of difference in length
> +relative to the previous line, unless this is the last line
> +and longer than second to last.  Higher numbers make it try
> +harder to keep all lines as even as possible in length at the
> +expense of other factors."
> +  :version "30.1"
> +  :type 'integer
> +  :group 'fill)
> +
> +(defcustom balanced-fill-single-penalty 150
> +  "Additional penalty either for starting a new sentence with a
> +single word right at the end of a line, or else for ending a
> +sentence with a single word left at the start of a line."
> +  :version "30.1"
> +  :type 'integer
> +  :group 'fill)
> +
> +(defcustom balanced-fill-break-penalty 50
> +  "Additional penalty for each line break added.  The larger this
> +is, the more the algorithm will try to minimize the number of
> +lines despite the other penalties."
> +  :version "30.1"
> +  :type 'integer
> +  :group 'fill)

These options are probably related, in that changing one needs a
suitable change in others to make sense.  If this is indeed so, please
mention the relations in the doc strings.  You say "Additional", but
that is meaningless when each doc string is read separately
(additional to what?)

> +(defun balanced-fill-break-lines (from to justify)
> +  ;; Build a table of visible word widths, with and without any preceding
> +  ;; spaces, along with whether the word starts a new sentence.  We go by
> +  ;; columns and not chars to handle invisible text (especially invisible
> +  ;; spaces), etc.

If this is an internal function, please use our convention of naming
it with double dash, as in balanced-fill--break-lines.  If this is
supposed to be a public function, it should have a doc string.

Last, but not least: what about performance?  Is this performant
enough to apply to large enough paragraphs, including via
auto-fill-mode?  Can you provide some measurements?

Thanks.





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode
  2023-01-23 15:34 ` Eli Zaretskii
@ 2023-02-13  8:27   ` Andrew Kensler
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Kensler @ 2023-02-13  8:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61028

[-- Attachment #1: Type: text/plain, Size: 6966 bytes --]

A revised patch is attached.  Replies follow inline.

On 1/23/23 7:34 AM, Eli Zaretskii wrote:
> Thanks, I think this is a welcome feature.  Please see a few comments
> below.  I will send you the form for copyright assignment off-list.
Signed and sent.
> I find this subsection too detailed and lengthy.  I'm not sure we want
> to describe all the customizable options that users can customize,
> just the main ones.  Assuming the default values are chosen well,
> chances are that most users will not need to customize too many of
> them, and therefore the manual doesn't have to describe them; we just
> need to mention that several more options are available, and perhaps
> have a special group for them for easier discoverability.  WDYT?

I have significantly trimmed this subsection of the manual.

I agree about the special group.  The options related to balanced fill 
mode have now been collected in their own 'balanced-fill customization 
group, which is a child of the main 'fill group. (Note that the other 
existing options in fill.el are now explicitly placed in the 'fill group 
rather than relying on implicit ordering relative to the last group 
definition.)  This subsection of the manual now tells how to customize 
the 'balanced-fill group, rather than listing all of the options 
individually.

I have also renamed several of the options to better imply their 
relation; the ones that directly contribute (or are an exponent on a 
value that contributes) to the total score being optimized now all end 
in -penalty, for example.

> My other concern about the documentation is that it seems to describe
> the feature in a way that is too technical and uses terminology from
> the field of optimizations.  I'm afraid that users without background
> in optimizations will have difficulty understanding some of the
> options.  Can you try describing this in a less formal manner, so as
> to make it easier to understand?

I have tried to move the focus of the documentation for this feature 
away from the technical details of the implementation and more towards a 
description of the effects.

> This example is too long.  I suggest to find a shorter one.  I think
> an example with 3 lines should be enough to explain the feature.

I have changed the manual subsection to use a much shorter example.  It 
is a bit longer than 3 lines, but that is because it uses a fill-width 
of 20.  However, this gave me room to put the examples with and without 
this feature side-by-side.

> This should mention at least a few main options that control the
> feature.

In the NEWS announcement, I have added a reference to the new 
customization group.

> Likewise here: this doc string is too abstract and thus hard to grok.
> Talking about minimization of a cost function that penalizes something
> only helps if one has some background in that domain.  I'd instead try
> to say something like "fills the entire paragraph avoiding too short
> lines" or something similar.

I have updated the doc string on the minor mode here, as well as the 
other doc strings to remove the mentions of minimizing a cost function.  
In this case, I went with "When enabled, filling will try to optimally 
choose a set of line breaks to make a paragraph look tidier by 
considering the entire paragraph at a time.  This may place line breaks 
sooner than necessary if it improves later lines."

> The first line of a doc string should be a single complete sentence
> (here and elsewhere in the patch).  This is important because the
> various "apropos" commands show only the first line of the doc string.
Good point!  This might be worth a reminder under the "Documenting your 
changes" section of the CONTRIBUTE guide where it mentions doc-strings.
> These options are probably related, in that changing one needs a
> suitable change in others to make sense.  If this is indeed so, please
> mention the relations in the doc strings.  You say "Additional", but
> that is meaningless when each doc string is read separately
> (additional to what?)

That's a fair point.  Each of the options now that contributes to the 
scoring now ends in -penalty and is a member of the 'balanced-fill 
customization group.  I have also updated the doc strings to mention 
that these values are about the prioritization of some aspect of the 
appearance of the paragraph relative to the other penalties.  Hopefully 
that is more clear now.  I have removed the "Additional" language.

> If this is an internal function, please use our convention of naming
> it with double dash, as in balanced-fill--break-lines.  If this is
> supposed to be a public function, it should have a doc string.

I have renamed it to use the double dash.  I had originally been trying 
to match it to the existing code in fill.el, which doesn't seem use that 
convention.  But I also realize that that code may predate that 
convention and you wouldn't have wanted to rename it and risk breaking 
any user code relying on it.  No need for new code to be bound by that, 
though.

> Last, but not least: what about performance?  Is this performant
> enough to apply to large enough paragraphs, including via
> auto-fill-mode?  Can you provide some measurements?

That's a fair question.  That is why I had added the 
balanced-fill-word-limit option to restrict this to small to medium 
paragraphs.  But to test the performance, I have written a small 
benchmark (see attached) that creates temporary buffers with 
successively larger copies of the common "Lorem ipsum" paragraph, all on 
on line, and measures the time to fill it.  Here are timings that I got 
from it on my 10-year old machine at a few key sizes, with and without 
nativecomp:

Without nativecomp:
   For 69 words:
     greedy fill   -- 0.000216s
     balanced fill -- 0.001203s
   For 483 words:
     greedy fill   -- 0.001201s
     balanced fill -- 0.023777s
   For 2001 words:
     greedy fill   -- 0.005144s
     balanced fill -- 0.341215s
With nativecomp:
   For 69 words:
     greedy fill   -- 0.000190s
     balanced fill -- 0.000991s
   For 483 words:
     greedy fill   -- 0.001086s
     balanced fill -- 0.021680s
   For 2001 words:
     greedy fill   -- 0.004645s
     balanced fill -- 0.316352s

Note that for this benchmark, I had raised the balanced-fill-word-limit 
option so that it would test the new balanced fill algorithm, even at 
larger sizes.  With its current default of the 500 word limit, the 
timings for the 2001 words tests when balanced fill mode was enabled 
were around ~0.5ms since it would fall back to the greedy fill 
algorithm.  Thus the maximum time with this minor mode enabled and the 
default word limit would be about ~2.5ms on this machine for a 500 word 
paragraph.

Hopefully this new revision addresses your major critiques from before.  
I'm happy to refine it further if you still have reservations or spot 
anything else.

[-- Attachment #2: 0001-Add-new-minor-balanced-fill-mode-to-filling.patch --]
[-- Type: text/x-patch, Size: 20769 bytes --]

From 0ea91482d592346cc46052bf1336cbc6a5954b31 Mon Sep 17 00:00:00 2001
From: Andrew Kensler <andrew@eastfarthing.com>
Date: Fri, 20 Jan 2023 18:17:41 -0800
Subject: [PATCH] Add new minor `balanced-fill-mode` to filling

When enabled, filling will consider the entire paragraph at a time
and try to place line breaks optimally to look more neat and even,
according to a cost function.  This is inspired by the Knuth-Plass
algorithm.

* lisp/textmodes/fill.el
(balanced-fill-mode)
(balanced-fill-word-limit)
(balanced-fill-margin-width)
(balanced-fill-runt-width)
(balanced-fill-length-penalty)
(balanced-fill-raggedness-penalty)
(balanced-fill-single-penalty)
(balanced-fill-break-penalty): New variables.
(balanced-fill-mode): New minor mode
(balanced-fill--break-lines): New line breaking function.
(fill-region-as-paragraph): Use it to fill paragraphs.

* test/lisp/textmodes/fill-tests.el: Add smoke test for it.

* doc/emacs/emacs.texi (Top):
* doc/emacs/text.texi (Filling Text): Document it.

* etc/NEWS: Announce it.
---
 doc/emacs/emacs.texi              |   1 +
 doc/emacs/text.texi               |  47 ++++++
 etc/NEWS                          |   7 +
 lisp/textmodes/fill.el            | 249 ++++++++++++++++++++++++++++--
 test/lisp/textmodes/fill-tests.el |  21 +++
 5 files changed, 314 insertions(+), 11 deletions(-)

diff --git a/doc/emacs/emacs.texi b/doc/emacs/emacs.texi
index 7071ea44edd..1eac76882e1 100644
--- a/doc/emacs/emacs.texi
+++ b/doc/emacs/emacs.texi
@@ -620,6 +620,7 @@ Top
 * Fill Prefix::         Filling paragraphs that are indented
                           or in a comment, etc.
 * Adaptive Fill::       How Emacs can determine the fill prefix automatically.
+* Balanced Fill::       Breaking lines to look more even and neat.
 
 Outline Mode
 
diff --git a/doc/emacs/text.texi b/doc/emacs/text.texi
index 18f2274cfa6..8c9d88d2cc3 100644
--- a/doc/emacs/text.texi
+++ b/doc/emacs/text.texi
@@ -497,6 +497,7 @@ Filling
 * Fill Commands::  Commands to refill paragraphs and center lines.
 * Fill Prefix::    Filling paragraphs that are indented or in a comment, etc.
 * Adaptive Fill::  How Emacs can determine the fill prefix automatically.
+* Balanced Fill::  Breaking lines to look more even and neat.
 @end menu
 
 @node Auto Fill
@@ -828,6 +829,52 @@ Adaptive Fill
 line.  If it returns @code{nil}, @code{adaptive-fill-regexp} gets
 a chance to find a prefix.
 
+@node Balanced Fill
+@subsection Balanced Filling
+
+@cindex balanced filling
+  Filling can consider an entire paragraph at a time when determining
+where to place line breaks, as an alternative to just greedily
+inserting a break after the last word that fits on each line.
+This may place line breaks sooner than strictly necessary if it
+will help to balance out the lengths of later lines.
+
+  For example, when @code{fill-column} is 20, compare the results
+of line breaking using the greedy fill algorithm (left) versus the
+balanced fill algorithm (right):
+
+@smallexample
+------ greedy ------    ----- balanced -----
+Hard and sharp as       Hard and sharp as
+flint, from which no    flint, from which
+steel had ever          no steel had ever
+struck out generous     struck out generous
+fire; secret, and       fire; secret, and
+self-contained, and     self-contained,
+solitary as an          and solitary as
+oyster.                 an oyster.
+------ greedy ------    ----- balanced -----
+@end smallexample
+
+  The balanced fill algorithm produces slightly shorter but more even
+line lengths, resulting in a less ragged right margin.  It also avoids
+leaving a single word alone on the last line.
+
+@findex balanced-fill-mode
+@vindex balanced-fill-mode
+@vindex balanced-fill-word-limit
+  You may enable the balanced fill algorithm with Balanced Fill mode,
+which is disabled by default.  To toggle it globally, type @kbd{M-x
+balanced-fill-mode}.  Note that even when it is enabled, paragraphs
+with more than @code{balanced-fill-word-limit} words will fall back
+to the faster greedy filling algorithm for speed.
+
+  To decide where to place line breaks, the balanced fill algorithm
+sums up a set of penalties to compute a score for how poor the layout
+of a paragraph is and then chooses the placement with the lowest
+score.  Type @code{M-x customize-group RET balanced-fill RET} to
+see options for fine-tuning the scoring penalties to taste.
+
 @node Case
 @section Case Conversion Commands
 @cindex case conversion
diff --git a/etc/NEWS b/etc/NEWS
index d3eafaadf19..15afedde8e1 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -59,6 +59,13 @@ This allows the user to customize the prompt that is appended by
 \f
 * Editing Changes in Emacs 30.1
 
+** Filling can now try to break lines evenly.
+The new 'balanced-fill-mode' minor mode can be enabled to make filling
+try to optimally choose a set of line breaks to make a paragraph
+look tidier by considering the entire paragraph at a time.  This is
+inspired by the Knuth-Plass algorithm.  See the customization group
+'balanced-fill' for options for fine-tuning to taste.
+
 ---
 ** On X, Emacs now supports input methods which perform "string conversion".
 This means an input method can now ask Emacs to delete text
diff --git a/lisp/textmodes/fill.el b/lisp/textmodes/fill.el
index 2fde2ff6c4d..a7949652dd3 100644
--- a/lisp/textmodes/fill.el
+++ b/lisp/textmodes/fill.el
@@ -42,12 +42,14 @@ fill-individual-varying-indent
 That mode can handle paragraphs with extra indentation on the first line,
 but it requires separator lines between paragraphs.
 A value of nil means that any change in indentation starts a new paragraph."
-  :type 'boolean)
+  :type 'boolean
+  :group 'fill)
 
 (defcustom colon-double-space nil
   "Non-nil means put two spaces after a colon when filling."
   :type 'boolean
-  :safe #'booleanp)
+  :safe #'booleanp
+  :group 'fill)
 
 (defcustom fill-separate-heterogeneous-words-with-space nil
   "Non-nil means to use a space to separate words of a different kind.
@@ -58,7 +60,8 @@ fill-separate-heterogeneous-words-with-space
 in `fill-nospace-between-words-table' for the characters before and
 after the newline."
   :type 'boolean
-  :version "26.1")
+  :version "26.1"
+  :group 'fill)
 
 (defvar fill-paragraph-function nil
   "Mode-specific function to fill a paragraph, or nil if there is none.
@@ -76,7 +79,8 @@ enable-kinsoku
 Kinsoku processing is designed to prevent certain characters from being
 placed at the beginning or end of a line by filling.
 See the documentation of `kinsoku' for more information."
-  :type 'boolean)
+  :type 'boolean
+  :group 'fill)
 
 (defun set-fill-prefix ()
   "Set the fill prefix to the current line up to point.
@@ -96,7 +100,8 @@ set-fill-prefix
 
 (defcustom adaptive-fill-mode t
   "Non-nil means determine a paragraph's fill prefix from its text."
-  :type 'boolean)
+  :type 'boolean
+  :group 'fill)
 
 (defcustom adaptive-fill-regexp
   ;; Added `!' for doxygen comments starting with `//!' or `/*!'.
@@ -112,7 +117,8 @@ adaptive-fill-regexp
 If the paragraph has just one line, the indentation is taken from that
 line, but in that case `adaptive-fill-first-line-regexp' also plays
 a role."
-  :type 'regexp)
+  :type 'regexp
+  :group 'fill)
 
 (defcustom adaptive-fill-first-line-regexp (purecopy "\\`[ \t]*\\'")
   "Regexp specifying whether to set fill prefix from a one-line paragraph.
@@ -124,13 +130,95 @@ adaptive-fill-first-line-regexp
 
 However, we never use a prefix from a one-line paragraph
 if it would act as a paragraph-starter on the second line."
-  :type 'regexp)
+  :type 'regexp
+  :group 'fill)
 
 (defcustom adaptive-fill-function #'ignore
   "Function to call to choose a fill prefix for a paragraph.
 A nil return value means the function has not determined the fill prefix."
   :version "27.1"
-  :type 'function)
+  :type 'function
+  :group 'fill)
+
+(defgroup balanced-fill nil
+  "Breaking lines to look more even and neat."
+  :link '(custom-manual "(emacs) Balanced Fill")
+  :group 'fill)
+
+(define-minor-mode balanced-fill-mode
+  "Toggle whether filling should try to neatly balance line lengths.
+
+When enabled, filling will try to optimally choose a set of line
+breaks to make a paragraph look tidier by considering the entire
+paragraph at a time.  This may place line breaks sooner than
+necessary if it improves later lines.  When disabled, filling
+uses the traditional greedy line breaking algorithm.
+
+See Info node `(emacs) Balanced Fill' for more details."
+  :global t
+  :version "30.1"
+  :type 'boolean
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-word-limit 500
+  "Largest paragraph, in words, to apply balanced filling to.
+If the paragraph exceeds this limit, balanced filling will fall
+back to the standard greedy filling to keep filling quick."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-margin-width 3
+  "Extra margin, in columns, for balanced filling to try to break lines at.
+Targetting a line length slightly short of the `fill-column' but
+being allowed to go all the way up to the `fill-column' can help
+balanced filling to make the lines more even in length."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-runt-width 10
+  "Minimum width, in columns, allowed for the last lines of paragraphs.
+If this is set to a high number, then the balanced filling will
+try to make the last lines as full as all the others."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-length-penalty 3
+  "Main tuning option for the balanced fill algorithm.
+Higher values cause balanced filling to prioritize (relative
+to other penalties) making each line exactly match the target
+length.  Typically this is set between 2 and 3."
+:version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-raggedness-penalty 40
+  "Penalty added for difference in length between adjacent lines.
+Higher values make balanced filling prioritize (relative to
+other penalties) reducing raggedness and keeping all lines as
+even as possible in length."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-single-penalty 150
+  "Penalty added for leaving a word of a sentence alone on a line.
+Higher values make balanced filling prioritize (relative to
+other penalties) avoiding line breaks right after the first
+word of a sentence or before the last word of a sentence."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
+
+(defcustom balanced-fill-break-penalty 50
+  "Penalty added for each line break inserted.
+Higher values make balanced filling prioritize (relative to other
+penalties) making a paragraph fit in as few lines as possible."
+  :version "30.1"
+  :type 'integer
+  :group 'balanced-fill)
 
 (defvar fill-indent-according-to-mode nil ;Screws up CC-mode's filling tricks.
   "Whether or not filling should try to use the major mode's indentation.")
@@ -374,11 +462,13 @@ fill-nobreak-predicate
 the line there."
   :type 'hook
   :options '(fill-french-nobreak-p fill-single-word-nobreak-p
-             fill-single-char-nobreak-p))
+             fill-single-char-nobreak-p)
+  :group 'fill)
 
 (defcustom fill-nobreak-invisible nil
   "Non-nil means that fill commands do not break lines in invisible text."
-  :type 'boolean)
+  :type 'boolean
+  :group 'fill)
 
 (defun fill-nobreak-p ()
   "Return nil if breaking the line at point is allowed.
@@ -644,6 +734,137 @@ fill-indent-to-left-margin
     (indent-line-to (current-left-margin))
     (put-text-property beg (point) 'face 'default)))
 
+(defun balanced-fill--break-lines (from to justify)
+  ;; (Line breaking implementation inspired by Knuth-Plass.)
+
+  ;; Abort (and fall back to greedy algorithm) if we have too many words.
+  ;; The algorithm below is worst-case quadratic (though usually not).
+  ;; Asymptotically faster algorithms exist, but are more complicated.
+  (if (> (count-words-region from to) balanced-fill-word-limit)
+      nil
+
+    ;; Build a table of visible word widths, with and without any preceding
+    ;; spaces, along with whether the word starts a new sentence.  We go by
+    ;; columns and not chars to handle invisible text (especially invisible
+    ;; spaces), etc.
+    (let ((words '())
+          (count 0)
+          (sentence-regexp (sentence-end)))
+      (goto-char to)
+      (while (> (point) from)
+        (let* ((previous (point))
+               (end (current-column))
+               (next (progn (fill-move-to-break-point from)
+                            ;; Needed for moving past some Org-mode links
+                            (skip-chars-backward " \t")
+                            (point)))
+               (with-space (current-column))
+               (without-space (progn (skip-chars-forward " \t")
+                                     (current-column)))
+               (new-sentence (looking-back sentence-regexp from)))
+          (goto-char next)
+          ;; If point didn't move, we're after the first word of the line.
+          (when (<= previous next)
+            (goto-char from)
+            (setq with-space (current-column)
+                  without-space (current-column)))
+          ;; Merge into previous entry when moving past invisible text.
+          (if (= end without-space)
+              (setq end (+ end (or (car (pop words)) 0))))
+          (push (list (- end with-space) (- end without-space) new-sentence)
+                words)))
+      (setq words (vconcat words))
+      (setq count (length words))
+
+      ;; Consider each word as a candidate to start a line, and build a
+      ;; table of which word the previous line would be best to start on
+      ;; and the width of the previous line.  Use dynamic programming to
+      ;; minimize a cost function.
+      (let ((starts (make-vector (1+ count) 0))
+            (widths (make-vector (1+ count) 0)))
+        (let* ((start 0)
+               (room (- fill-column (current-column)))
+               (costs (make-vector (1+ count) most-positive-fixnum))
+               (rags (make-vector (1+ count) balanced-fill-margin-width)))
+          (aset costs 0 0)
+          (while (< start count)
+            ;; The room for the first line may be different than the rest.
+            (if (= start 1)
+                (setq room (- fill-column
+                              (string-width (or fill-prefix "")))))
+            ;; Consider each word this new line might end on.  Don't test
+            ;; width against room yet; we want at least one word per line,
+            ;; so we need at least one iteration.
+            (let ((end start)
+                  (width 0))
+              (while (< end count)
+                ;; Don't add the space before the first word to the width.
+                (setq width (+ width (nth (if (= end start) 1 0)
+                                          (aref words end))))
+                ;; Our cost function is the sum of the best total cost for
+                ;; all the lines preceding the start of this one,
+                (let ((cost (+ (aref costs start)
+                               ;; plus the distance from the margin
+                               (expt (abs (- room balanced-fill-margin-width
+                                             width))
+                                     ;; exponentiated if not the last line
+                                     (if (or (/= (1+ end) count)
+                                             ;; or if we'd leave a runt,
+                                             (< width balanced-fill-runt-width))
+                                         balanced-fill-length-penalty 1))
+                               ;; plus a penalty for an uneven right side
+                               (* (abs (- room width
+                                          (aref rags start)))
+                                  ;; if not the last line
+                                  (if (or (/= (1+ end) count)
+                                          ;; unless the last line is longer,
+                                          (< (- room width)
+                                             (aref rags start)))
+                                      balanced-fill-raggedness-penalty 0))
+                               ;; plus a penalty if ending on the start of
+                               ;; a new sentence,
+                               (if (nth 2 (aref words end))
+                                   balanced-fill-single-penalty 0)
+                               ;; and a penalty if starting with a single
+                               ;; word and then a new sentence,
+                               (if (and (< (1+ start) count)
+                                        (nth 2 (aref words (1+ start))))
+                                   balanced-fill-single-penalty 0)
+                               ;; and a penalty for each break.
+                               balanced-fill-break-penalty)))
+                  ;; For ending after here, is this a better starting place?
+                  (when (and (<= cost (aref costs (1+ end)))
+                             (or (<= width room) (= end start)))
+                    (aset costs (1+ end) cost)
+                    (aset starts (1+ end) start)
+                    (aset widths (1+ end) width)
+                    (aset rags (1+ end) (- room width))))
+                ;; Break the inner (end) loop if we're out of room now.
+                (if (>= width room)
+                    (setq end count))
+                (setq end (1+ end))))
+            (setq start (1+ start))))
+
+        ;; Walk backwards from the end of the table to reconstruct a list
+        ;; of the optimal widths for each line.
+        (let (chosen)
+          (let ((end count))
+            (while (> end 0)
+              (if (< end count)
+                  (push (aref widths end) chosen))
+              (setq end (aref starts end))))
+
+          ;; Then insert those line breaks and justify each line.
+          (mapc (lambda (width)
+                  (move-to-column (+ (current-column) width))
+                  (fill-newline)
+                  (if justify
+                      (save-excursion
+                        (forward-line -1)
+                        (justify-current-line justify nil t))))
+                chosen)
+          (if justify (justify-current-line justify t t)))))))
+
 (defun fill-region-as-paragraph (from to &optional justify
 				      nosqueeze squeeze-after)
   "Fill the region as if it were a single paragraph.
@@ -762,6 +983,11 @@ fill-region-as-paragraph
 
 	;; This is the actual filling loop.
 	(goto-char from)
+        ;; Attempt to break into balanced lines if desired.
+        (when balanced-fill-mode
+          (balanced-fill--break-lines from to justify))
+        ;; Otherwise (if point is still at from), fall back to the standard
+        ;; greedy line breaking loop.
 	(let (linebeg)
           (while (< (point) to)
 	    (setq linebeg (point))
@@ -1116,7 +1342,8 @@ default-justification
 		 (const full)
 		 (const center)
 		 (const none))
-  :safe 'symbolp)
+  :safe 'symbolp
+  :group 'fill)
 (make-variable-buffer-local 'default-justification)
 
 (defun current-justification ()
diff --git a/test/lisp/textmodes/fill-tests.el b/test/lisp/textmodes/fill-tests.el
index ef822ba805b..18519eb69a7 100644
--- a/test/lisp/textmodes/fill-tests.el
+++ b/test/lisp/textmodes/fill-tests.el
@@ -121,6 +121,27 @@ test-fill-haskell
   ;; w
 ")))
 
+(ert-deftest fill-test-balanced-fill-mode nil
+  "Basic tests of the `balanced-fill-mode' option."
+  (with-temp-buffer
+    (insert "Ccc ccc a bb dddd bb bb a ccc a jjjjjjjjjj a eeeee a hhhhhhhh bb dddd.")
+    (setq fill-column 15)
+    (setq-local balanced-fill-mode nil)
+    (fill-paragraph)
+    (should (string= (buffer-string) "Ccc ccc a bb\ndddd bb bb a\nccc a\njjjjjjjjjj a\neeeee a\nhhhhhhhh bb\ndddd.")))
+  (with-temp-buffer
+    (insert "Ccc ccc a bb dddd bb bb a ccc a jjjjjjjjjj a eeeee a hhhhhhhh bb dddd.")
+    (setq fill-column 15)
+    (setq-local balanced-fill-mode t)
+    (fill-paragraph)
+    (should (string= (buffer-string) "Ccc ccc a\nbb dddd bb\nbb a ccc a\njjjjjjjjjj\na eeeee a\nhhhhhhhh\nbb dddd.")))
+  (with-temp-buffer
+    (insert "Ccc ccc a bb dddd bb bb a ccc a jjjjjjjjjj a eeeee a hhhhhhhh bb dddd.")
+    (setq fill-column 15)
+    (setq-local balanced-fill-mode t)
+    (fill-paragraph 'full)
+    (should (string= (buffer-string) "Ccc    ccc    a\nbb    dddd   bb\nbb   a  ccc   a\njjjjjjjjjj\na    eeeee    a\nhhhhhhhh\nbb dddd."))))
+
 (provide 'fill-tests)
 
 ;;; fill-tests.el ends here
-- 
2.17.1


[-- Attachment #3: benchmark-fill.el --]
[-- Type: text/x-emacs-lisp, Size: 954 bytes --]

(defun benchmark-fill ()
  (interactive)
  (dotimes (reps 30)
    (with-temp-buffer
      (dotimes (rep reps)
        (insert "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed ")
        (insert "do eiusmod tempor incididunt ut labore et dolore magna ")
        (insert "aliqua.  Ut enim ad minim veniam, quis nostrud exercitation ")
        (insert "ullamco laboris nisi ut aliquip ex ea commodo consequat.  ")
        (insert "Duis aute irure dolor in reprehenderit in voluptate velit ")
        (insert "esse cillum dolore eu fugiat nulla pariatur.  Excepteur sint ")
        (insert "occaecat cupidatat non proident, sunt in culpa qui officia ")
        (insert "deserunt mollit anim id est laborum.  "))
      (let ((words (count-words-region (point-min) (point-max)))
            (time (current-time)))
        (fill-paragraph nil t)
        (message "Words: %4d, Time: %0.6f"
                 words (float-time (time-since time)))))))

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-02-13  8:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-23  9:40 bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode Andrew Kensler
2023-01-23 15:34 ` Eli Zaretskii
2023-02-13  8:27   ` Andrew Kensler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).