From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#61028: 30.0.50; [PATCH] [FEATURE] Balanced fill mode Date: Mon, 23 Jan 2023 17:34:48 +0200 Message-ID: <83mt69i7s7.fsf@gnu.org> References: <9ec521a6-b324-3f09-7495-c55638b890f8@eastfarthing.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18766"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 61028@debbugs.gnu.org To: Andrew Kensler Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 23 16:36:21 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pJyrs-0004dJ-9s for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 23 Jan 2023 16:36:20 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pJyrd-0004Ps-W5; Mon, 23 Jan 2023 10:36:06 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJyrb-0004PS-Hf for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 10:36:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pJyra-0003qo-NM for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 10:36:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pJyra-0004Tp-Ab for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 10:36:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 23 Jan 2023 15:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61028 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 61028-submit@debbugs.gnu.org id=B61028.167448810517137 (code B ref 61028); Mon, 23 Jan 2023 15:36:02 +0000 Original-Received: (at 61028) by debbugs.gnu.org; 23 Jan 2023 15:35:05 +0000 Original-Received: from localhost ([127.0.0.1]:55115 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJyqe-0004SJ-Jc for submit@debbugs.gnu.org; Mon, 23 Jan 2023 10:35:05 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:37020) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJyqc-0004Rj-9D for 61028@debbugs.gnu.org; Mon, 23 Jan 2023 10:35:03 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJyqW-0003Qh-73; Mon, 23 Jan 2023 10:34:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=mCtumc0axHnQy3sRZ/G5hKkWLvcGCaY/SvSytEPXEEo=; b=N033ukrq4DjL YZHm7MxgZLpXGrOXx4vGoUsY9wC7z6OhtI8Lx79Masig9aAwBRbiwB8MUUIvnewkY8/+bWgzGDapJ R5KJ7RiWH40k2XKj77w8cFbq4pVapd+j+fjs3ZzcDVH60umLuLGdyUPo+2NPU2VGVe4KX+afR5jcm yUvtaLgxxancLc/6GKJ4pKyqag+RfVriJLChOfgoHasFg2/RM60qm+yEHb3/EeAqeiU5peqFAajHI cPlZFfCktBYewOqEviKAwgRD3V9oLgrEY00M3OfEK+8VxxZ55iwHeYhRMVK6LTonLA2yZ3GNA8DHp ajWGcQmgvCU9PiUNvm265g==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJyqH-0003HV-4c; Mon, 23 Jan 2023 10:34:55 -0500 In-Reply-To: <9ec521a6-b324-3f09-7495-c55638b890f8@eastfarthing.com> (message from Andrew Kensler on Mon, 23 Jan 2023 01:40:40 -0800) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254006 Archived-At: > Date: Mon, 23 Jan 2023 01:40:40 -0800 > From: Andrew Kensler > > The proposed patch attached adds a new minor balanced-fill-mode with an alternate line breaking algorithm > for paragraph filling. When enabled, it will try to neatly balance line lengths to reduce raggedness, avoid > widows, avoid starting a new sentence on the last word of a line, avoid ending a sentence on the first word > of a line, and so forth. It is heavily inspired by the Knuth-Plass line breaking algorithm and uses dynamic > programming to try to choose line breaks to minimize a cost function. > > For example, consider the following mock paragraph as filled by the current greedy algorithm with the > fill-column set to 15: > > Ccc ccc a bb > dddd bb bb a > ccc a > jjjjjjjjjj a > eeeee a > hhhhhhhh bb > dddd. > > With the new minor mode enabled, it would instead be filled much more nicely as: > > Ccc ccc a > bb dddd bb > bb a ccc a > jjjjjjjjjj > a eeeee a > hhhhhhhh > bb dddd. > > Often, the result is similar to simply having used a particular slightly narrower fill-column with the current > greedy algorithm. The advantage, however, is that it figures out the correct column automatically and > per-paragraph. > > The main piece of implementation is in the new (balanced-fill-break-lines) function in fill.el, with a modification > to (fill-region-as-paragraph) to optionally call this before its current line breaking loop. If it runs successfully, > then point will be set to the end of the paragraph and that loop skipped. > > Note that (fill-region-as-paragraph) has no hooks and is too monolithic to advise for this kind of thing, hence > my hoping to upstream this change. > > This is my first time contributing anything significant to Emacs or writing here. I believe that the proposed > patch covers all the major needs: the code itself, commit message, info documentation, announcement in > NEWS, and a basic ERT test. If there's anything I've missed or suggestions for improvements, please let > me know. (And I hope this is the correct mailing list and message format, too.) I'll be happy to sign the > copyright assignment paperwork if this looks like something you'd like to accept. Thanks, I think this is a welcome feature. Please see a few comments below. I will send you the form for copyright assignment off-list. > +@node Balanced Fill > +@subsection Balanced Filling I find this subsection too detailed and lengthy. I'm not sure we want to describe all the customizable options that users can customize, just the main ones. Assuming the default values are chosen well, chances are that most users will not need to customize too many of them, and therefore the manual doesn't have to describe them; we just need to mention that several more options are available, and perhaps have a special group for them for easier discoverability. WDYT? My other concern about the documentation is that it seems to describe the feature in a way that is too technical and uses terminology from the field of optimizations. I'm afraid that users without background in optimizations will have difficulty understanding some of the options. Can you try describing this in a less formal manner, so as to make it easier to understand? > + For example, if @code{fill-column} is 60 and balanced filling is > +disabled then the greedy algorithm will fill the following paragraph > +like so: > + > +@smallexample > +"It's not too far-fetched to say that the best programs are > +the ones written when the programmer is supposed to be > +working on something else... Very good things happen when > +management is enlightened enough to appreciate the > +importance of allowing programmers some free time for > +projects of this sort." --Melinda Varian > +@end smallexample > + > +@noindent > +while enabling the balanced filling will instead produce the following > +with slightly shorter but more even line lengths: > + > +@smallexample > +"It's not too far-fetched to say that the best programs > +are the ones written when the programmer is supposed to > +be working on something else... Very good things happen > +when management is enlightened enough to appreciate the > +importance of allowing programmers some free time for > +projects of this sort." -- Melinda Varian > +@end smallexample This example is too long. I suggest to find a shorter one. I think an example with 3 lines should be enough to explain the feature. > --- a/etc/NEWS > +++ b/etc/NEWS > @@ -53,6 +53,12 @@ trash when deleting. Default is nil. > > * Editing Changes in Emacs 30.1 > > +** Filling can now try to break lines evenly. > +The new user option 'balanced-fill-mode' can be set to non-nil to > +make filling consider the entire paragraph at a time and try to > +place line breaks optimally to look more neat and even, according > +to a cost function. This is inspired by the Knuth-Plass algorithm. This should mention at least a few main options that control the feature. > +(define-minor-mode balanced-fill-mode > + "Toggle whether filling should try to neatly balance line lengths. > + > +When enabled, filling will consider the entire paragraph and > +try to optimally choose a set of line breaks to minimize a > +cost function that penalizes untidy paragraphs. This may > +place line breaks sooner than necessary if it improves later > +lines. When disabled, filling uses the traditional greedy line > +breaking algorithm. Likewise here: this doc string is too abstract and thus hard to grok. Talking about minimization of a cost function that penalizes something only helps if one has some background in that domain. I'd instead try to say something like "fills the entire paragraph avoiding too short lines" or something similar. > +(defcustom balanced-fill-maximum-words 500 > + "Maximum limit on the number of words that the balanced fill > +algorithm will attempt to process in a single paragraph. If The first line of a doc string should be a single complete sentence (here and elsewhere in the patch). This is important because the various "apropos" commands show only the first line of the doc string. > +(defcustom balanced-fill-length-exponent 3 > + "Controls the penalty for lines that are shorter or longer than > +the target length to the margin. The difference between the > +actual and target length will be raised to this power when > +calculating the cost of a potential line break. This is the > +main control for the cost function." > + :version "30.1" > + :type 'integer > + :group 'fill) > + > +(defcustom balanced-fill-raggedness-penalty 40 > + "Additional penalty for each column of difference in length > +relative to the previous line, unless this is the last line > +and longer than second to last. Higher numbers make it try > +harder to keep all lines as even as possible in length at the > +expense of other factors." > + :version "30.1" > + :type 'integer > + :group 'fill) > + > +(defcustom balanced-fill-single-penalty 150 > + "Additional penalty either for starting a new sentence with a > +single word right at the end of a line, or else for ending a > +sentence with a single word left at the start of a line." > + :version "30.1" > + :type 'integer > + :group 'fill) > + > +(defcustom balanced-fill-break-penalty 50 > + "Additional penalty for each line break added. The larger this > +is, the more the algorithm will try to minimize the number of > +lines despite the other penalties." > + :version "30.1" > + :type 'integer > + :group 'fill) These options are probably related, in that changing one needs a suitable change in others to make sense. If this is indeed so, please mention the relations in the doc strings. You say "Additional", but that is meaningless when each doc string is read separately (additional to what?) > +(defun balanced-fill-break-lines (from to justify) > + ;; Build a table of visible word widths, with and without any preceding > + ;; spaces, along with whether the word starts a new sentence. We go by > + ;; columns and not chars to handle invisible text (especially invisible > + ;; spaces), etc. If this is an internal function, please use our convention of naming it with double dash, as in balanced-fill--break-lines. If this is supposed to be a public function, it should have a doc string. Last, but not least: what about performance? Is this performant enough to apply to large enough paragraphs, including via auto-fill-mode? Can you provide some measurements? Thanks.