unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] 5x speedup of flyspell-buffer
@ 2023-04-12 13:50 Ilya Zakharevich
  2023-04-13  6:54 ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Zakharevich @ 2023-04-12 13:50 UTC (permalink / raw)
  To: emacs-devel

With the included patch, on my system flyspell-buffer speeds up 5
times.  For my typical file sizes, this is 20sec -> 4sec on a 600K
LaTeX file with 10,000 words detected by `aspell´ — which makes
using flyspell feasible.

  The version is not the most recent, but IIUC, flyspell did not change.

(The number 1.05 below is chosen to lower overhead of messages 50 times,
from 5x to 0.1x.  It is related to log(10000)/0.05 ∼ 10000/50.)

Enjoy,
Ilya

--- 28.0.91/lisp/textmodes/flyspell.el-orig	2022-01-22 12:43:04.000000000 -0800
+++ 28.0.91/lisp/textmodes/flyspell.el	2023-04-12 06:31:39.836769300 -0700
@@ -1437,7 +1437,9 @@ The buffer to mark them in is `flyspell-
                              (if ispell-many-otherchars-p
                                  "*" "?")))
          (buffer-scan-pos flyspell-large-region-beg)
-         case-fold-search)
+         case-fold-search
+         (countw 0)
+         (countw-rep 0))
     (with-current-buffer flyspell-external-ispell-buffer
       (goto-char (point-min))
       ;; Loop over incorrect words, in the order they were reported,
@@ -1450,10 +1452,13 @@ The buffer to mark them in is `flyspell-
 	  ;; identical words, and the loop below would search for that many.
 	  ;; That code seemed to be incorrect, and on principle, should
 	  ;; be unnecessary too. -- rms.
-	  (if flyspell-issue-message-flag
-	      (message "Spell Checking...%d%% [%s]"
-		       (floor (* 100.0 (point)) (point-max))
-		       word))
+	  (setq countw (1+ countw))
+	  (when (and flyspell-issue-message-flag
+		     (> countw (* 1.05 countw-rep)))
+	    (setq countw-rep countw)
+	    (message "Spell Checking...%d%% [%s]"
+		     (floor (* 100.0 (point)) (point-max))
+		     word))
 	  (with-current-buffer flyspell-large-region-buffer
 	    (goto-char buffer-scan-pos)
 	    (let ((keep t))



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-12 13:50 [PATCH] 5x speedup of flyspell-buffer Ilya Zakharevich
@ 2023-04-13  6:54 ` Eli Zaretskii
  2023-04-13 10:23   ` Ilya Zakharevich
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-04-13  6:54 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: emacs-devel

> Date: Wed, 12 Apr 2023 06:50:30 -0700
> From: Ilya Zakharevich <ilya@ilyaz.org>
> 
> With the included patch, on my system flyspell-buffer speeds up 5
> times.  For my typical file sizes, this is 20sec -> 4sec on a 600K
> LaTeX file with 10,000 words detected by `aspell´ — which makes
> using flyspell feasible.
> 
>   The version is not the most recent, but IIUC, flyspell did not change.
> 
> (The number 1.05 below is chosen to lower overhead of messages 50 times,
> from 5x to 0.1x.  It is related to log(10000)/0.05 ∼ 10000/50.)

Thanks, but does it indeed make sense to print these progress message
with logarithmically-increasing intervals?  Why not simply decimate
the messages by printing them every N words or so?  N could be
computed at the beginning of the command as function of the size of
the region to be spell-checked, assuming some suitable value of
average length of a word.  WDYT?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-13  6:54 ` Eli Zaretskii
@ 2023-04-13 10:23   ` Ilya Zakharevich
  2023-04-13 10:32     ` Eli Zaretskii
  2023-04-15  1:51     ` interacting during (sit-for 0). Was: " Ilya Zakharevich
  0 siblings, 2 replies; 10+ messages in thread
From: Ilya Zakharevich @ 2023-04-13 10:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu, Apr 13, 2023 at 09:54:59AM +0300, Eli Zaretskii wrote:
> > Date: Wed, 12 Apr 2023 06:50:30 -0700
> > From: Ilya Zakharevich <ilya@ilyaz.org>
> > 
> > With the included patch, on my system flyspell-buffer speeds up 5
> > times.  For my typical file sizes, this is 20sec -> 4sec on a 600K
> > LaTeX file with 10,000 words detected by `aspell´ — which makes
> > using flyspell feasible.
> > 
> >   The version is not the most recent, but IIUC, flyspell did not change.
> > 
> > (The number 1.05 below is chosen to lower overhead of messages 50 times,
> > from 5x to 0.1x.  It is related to log(10000)/0.05 ∼ 10000/50.)
> 
> Thanks, but does it indeed make sense to print these progress message
> with logarithmically-increasing intervals?  Why not simply decimate
> the messages by printing them every N words or so?  N could be
> computed at the beginning of the command as function of the size of
> the region to be spell-checked, assuming some suitable value of
> average length of a word.  WDYT?

Without profiling, how to find out “how much overhead is tolerable to
the user”?!

Going logarithmic avoids all these complications.  The REAL purpose
of these messages is “the feedback to the user why the UI is locked now”.

Thanks,
Ilya

P.S.  BTW, there is very little need to lock the UI.  On my (very old
      and slow) system, these 10,000 misspellings are reported in 300ms
      (even though aspell is cygwin — read “slow” — one).  After this
      list of misspellings is swallowed, the rest of marking can be
      done asynchronously, via timers.

      I may try to find more time to investigate (and maybe implement) this…



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-13 10:23   ` Ilya Zakharevich
@ 2023-04-13 10:32     ` Eli Zaretskii
  2023-04-13 20:29       ` Ilya Zakharevich
  2023-04-15  1:51     ` interacting during (sit-for 0). Was: " Ilya Zakharevich
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-04-13 10:32 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: emacs-devel

> Date: Thu, 13 Apr 2023 03:23:18 -0700
> From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> Cc: emacs-devel@gnu.org
> 
> > > (The number 1.05 below is chosen to lower overhead of messages 50 times,
> > > from 5x to 0.1x.  It is related to log(10000)/0.05 ∼ 10000/50.)
> > 
> > Thanks, but does it indeed make sense to print these progress message
> > with logarithmically-increasing intervals?  Why not simply decimate
> > the messages by printing them every N words or so?  N could be
> > computed at the beginning of the command as function of the size of
> > the region to be spell-checked, assuming some suitable value of
> > average length of a word.  WDYT?
> 
> Without profiling, how to find out “how much overhead is tolerable to
> the user”?!

Since you say the change you propose speeds up by a factor of 5, just
setting N = 5 should do approximately the same, no?  I'd actually go
with N = 10 for a good measure.

> Going logarithmic avoids all these complications.

But, as you say, the 1.05 factor was chosen by some kind of profiling
as well, no?

> P.S.  BTW, there is very little need to lock the UI.  On my (very old
>       and slow) system, these 10,000 misspellings are reported in 300ms
>       (even though aspell is cygwin — read “slow” — one).  After this
>       list of misspellings is swallowed, the rest of marking can be
>       done asynchronously, via timers.

If you intend to do this asynchronously, I think a relevant question
is why would you need to do that for the entire buffer?  When doing
this synchronously, the answer is clear, but doing it asynchronously
means you don't really care when will it end.  So in that case,
perhaps a better idea would be to spell-check only the stuff in the
window, and update that as the window is scrolled to expose more of
the text, like jit-lock.el does for fontifications?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-13 10:32     ` Eli Zaretskii
@ 2023-04-13 20:29       ` Ilya Zakharevich
  2023-04-14  5:47         ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Zakharevich @ 2023-04-13 20:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu, Apr 13, 2023 at 01:32:11PM +0300, Eli Zaretskii wrote:
> > Without profiling, how to find out “how much overhead is tolerable to
> > the user”?!
> 
> Since you say the change you propose speeds up by a factor of 5, just
> setting N = 5 should do approximately the same, no?  I'd actually go
> with N = 10 for a good measure.

Nope.  My solution decreases THE OVERHEAD 50 times.  This SPEEDS UP
the whole brouhaha ∼5x (since the overhead was 4x). 

> > Going logarithmic avoids all these complications.
> 
> But, as you say, the 1.05 factor was chosen by some kind of profiling
> as well, no?

Sure — but the logarithmic solution is not sensitive to the
system/operator’s specifics. 

> > P.S.  BTW, there is very little need to lock the UI.  On my (very old
> >       and slow) system, these 10,000 misspellings are reported in 300ms
> >       (even though aspell is cygwin — read “slow” — one).  After this
> >       list of misspellings is swallowed, the rest of marking can be
> >       done asynchronously, via timers.
> 
> If you intend to do this asynchronously, I think a relevant question
> is why would you need to do that for the entire buffer?  When doing
> this synchronously, the answer is clear, but doing it asynchronously
> means you don't really care when will it end.  So in that case,
> perhaps a better idea would be to spell-check only the stuff in the
> window, and update that as the window is scrolled to expose more of
> the text, like jit-lock.el does for fontifications?

The simple answer: this does not work.

Observe “how well implemented” it is now — after decades of many
people trying!  (I have seen something like tens of discussions of
people attempting something like this.  The latest attempt I have seen
uses POLLING!)

Ilya

P.S.  This seems (probably) very similar to jit-lock vs lazy-lock.  In my
      experiments, jit-lock is/was “an order of magnitude worse”.  (Do
      not remember the details)

      So I need to do through a lot of fracas to upgrade to each new
      version of Emacs…

P.P.S.  On my system, the minimal overhead of flyspell-large-region (for
	small regions) is ~0.6sec.  Not suitable for every update of a
	buffer.

	(This is ON TOP of “other problems of implementation”.)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-13 20:29       ` Ilya Zakharevich
@ 2023-04-14  5:47         ` Eli Zaretskii
  2023-04-14  6:39           ` Dr. Arne Babenhauserheide
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-04-14  5:47 UTC (permalink / raw)
  To: Ilya Zakharevich; +Cc: emacs-devel

> Date: Thu, 13 Apr 2023 13:29:33 -0700
> From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> Cc: emacs-devel@gnu.org
> 
> On Thu, Apr 13, 2023 at 01:32:11PM +0300, Eli Zaretskii wrote:
> > > Without profiling, how to find out “how much overhead is tolerable to
> > > the user”?!
> > 
> > Since you say the change you propose speeds up by a factor of 5, just
> > setting N = 5 should do approximately the same, no?  I'd actually go
> > with N = 10 for a good measure.
> 
> Nope.  My solution decreases THE OVERHEAD 50 times.  This SPEEDS UP
> the whole brouhaha ∼5x (since the overhead was 4x). 

OK, then N = 50 should do the same trick.

> > If you intend to do this asynchronously, I think a relevant question
> > is why would you need to do that for the entire buffer?  When doing
> > this synchronously, the answer is clear, but doing it asynchronously
> > means you don't really care when will it end.  So in that case,
> > perhaps a better idea would be to spell-check only the stuff in the
> > window, and update that as the window is scrolled to expose more of
> > the text, like jit-lock.el does for fontifications?
> 
> The simple answer: this does not work.
> 
> Observe “how well implemented” it is now — after decades of many
> people trying!  (I have seen something like tens of discussions of
> people attempting something like this.  The latest attempt I have seen
> uses POLLING!)

It does work here and elsewhere.  But suit yourself.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-14  5:47         ` Eli Zaretskii
@ 2023-04-14  6:39           ` Dr. Arne Babenhauserheide
  2023-04-14  6:54             ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. Arne Babenhauserheide @ 2023-04-14  6:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ilya Zakharevich, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]


Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Thu, 13 Apr 2023 13:29:33 -0700
>> From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
>> Cc: emacs-devel@gnu.org
>> 
>> On Thu, Apr 13, 2023 at 01:32:11PM +0300, Eli Zaretskii wrote:
>> > > Without profiling, how to find out “how much overhead is tolerable to
>> > > the user”?!
>> > 
>> > Since you say the change you propose speeds up by a factor of 5, just
>> > setting N = 5 should do approximately the same, no?  I'd actually go
>> > with N = 10 for a good measure.
>> 
>> Nope.  My solution decreases THE OVERHEAD 50 times.  This SPEEDS UP
>> the whole brouhaha ∼5x (since the overhead was 4x). 
>
> OK, then N = 50 should do the same trick.

Wouldn’t that then show too few messages for small buffers?

I quite like the logarithmic message showing, because it ensures that
the user is informed of progress while avoiding a constant spew of
messages.

The only thing I worry about is that there may at some point be too few
messages, so a maximum might be useful to ensure that there is some
progress notification at least once every few seconds. Otherwise users
could think that it stalled.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] 5x speedup of flyspell-buffer
  2023-04-14  6:39           ` Dr. Arne Babenhauserheide
@ 2023-04-14  6:54             ` Eli Zaretskii
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2023-04-14  6:54 UTC (permalink / raw)
  To: Dr. Arne Babenhauserheide; +Cc: nospam-abuse, emacs-devel

> From: "Dr. Arne Babenhauserheide" <arne_bab@web.de>
> Cc: Ilya Zakharevich <nospam-abuse@ilyaz.org>, emacs-devel@gnu.org
> Date: Fri, 14 Apr 2023 08:39:10 +0200
> 
> > OK, then N = 50 should do the same trick.
> 
> Wouldn’t that then show too few messages for small buffers?

There's no sense of showing messages for jobs that finish so fast.

> I quite like the logarithmic message showing, because it ensures that
> the user is informed of progress while avoiding a constant spew of
> messages.

The need for informing the user about the progress only exists when
the operation takes long enough time.

> The only thing I worry about is that there may at some point be too few
> messages, so a maximum might be useful to ensure that there is some
> progress notification at least once every few seconds. Otherwise users
> could think that it stalled.

How much over-engineering are we willing to put into this nit?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: interacting during (sit-for 0). Was: [PATCH] 5x speedup of flyspell-buffer
  2023-04-13 10:23   ` Ilya Zakharevich
  2023-04-13 10:32     ` Eli Zaretskii
@ 2023-04-15  1:51     ` Ilya Zakharevich
  2023-04-15  2:01       ` Ilya Zakharevich
  1 sibling, 1 reply; 10+ messages in thread
From: Ilya Zakharevich @ 2023-04-15  1:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu, Apr 13, 2023 at 03:23:18AM -0700, Ilya Zakharevich wrote:
> P.S.  BTW, there is very little need to lock the UI.  On my (very old
>       and slow) system, these 10,000 misspellings are reported in 300ms
>       (even though aspell is cygwin — read “slow” — one).  After this
>       list of misspellings is swallowed, the rest of marking can be
>       done asynchronously, via timers.
> 
>       I may try to find more time to investigate (and maybe implement) this…

In fact, I think there is a much better way to work around this.

Suppose that a variable `recursive-edit-exit-on-idle' is supported by
the even loop.  Would then the following code implement
  (sit-for-0-interactively 'READ-ONLY)
correctly?

	(if (or unread-command-events unread-input-method-events unread-post-input-method-events)
;;;	  (with-current-buffer flyspell-large-region-buffer
	    (let ((buffer-read-only t) (inhibit-read-only nil)
		  (recursive-edit-exit-on-idle t))
	      (condition-case nil
		  (recursive-edit)
		(error nil))));;;)

Thanks,
Ilya



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: interacting during (sit-for 0). Was: [PATCH] 5x speedup of flyspell-buffer
  2023-04-15  1:51     ` interacting during (sit-for 0). Was: " Ilya Zakharevich
@ 2023-04-15  2:01       ` Ilya Zakharevich
  0 siblings, 0 replies; 10+ messages in thread
From: Ilya Zakharevich @ 2023-04-15  2:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Fri, Apr 14, 2023 at 06:51:31PM -0700, Ilya Zakharevich wrote:
> Suppose that a variable `recursive-edit-exit-on-idle' is supported by
> the even loop.  Would then the following code implement
>   (sit-for-0-interactively 'READ-ONLY)
> correctly?
> 
> 	(if (or unread-command-events unread-input-method-events unread-post-input-method-events)
> ;;;	  (with-current-buffer flyspell-large-region-buffer
> 	    (let ((buffer-read-only t) (inhibit-read-only nil)
> 		  (recursive-edit-exit-on-idle t))
> 	      (condition-case nil
> 		  (recursive-edit)
> 		(error nil))));;;)

Thinking on this more: 'quit should be propagated (since the user
should not know about an extra level of recursion).

Moreover, the implementation of `recursive-edit-exit-on-idle' should
disallow entering EXTRA LEVELS of recursive edit when this variable is set.


> Thanks,
> Ilya



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-04-15  2:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 13:50 [PATCH] 5x speedup of flyspell-buffer Ilya Zakharevich
2023-04-13  6:54 ` Eli Zaretskii
2023-04-13 10:23   ` Ilya Zakharevich
2023-04-13 10:32     ` Eli Zaretskii
2023-04-13 20:29       ` Ilya Zakharevich
2023-04-14  5:47         ` Eli Zaretskii
2023-04-14  6:39           ` Dr. Arne Babenhauserheide
2023-04-14  6:54             ` Eli Zaretskii
2023-04-15  1:51     ` interacting during (sit-for 0). Was: " Ilya Zakharevich
2023-04-15  2:01       ` Ilya Zakharevich

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).