unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#48841: fido-mode is slower than ido-mode with similar settings
@ 2021-06-05  1:39 Dmitry Gutov
  2021-06-05  9:35 ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-05  1:39 UTC (permalink / raw)
  To: 48841

I'm comparing

   ido-mode
   with ido-ubiquitous-mode (for support for arbitrary completion 
tables), available at 
https://github.com/DarwinAwardWinner/ido-completing-read-plus
   with (setq ido-enable-flex-matching t), of course

versus

  fido-mode
  with
    (setq icomplete-compute-delay 0)
    (setq icomplete-show-matches-on-no-input t)
    (setq icomplete-max-delay-chars 0)

The values chosen for behavior maximally close to ido.

Try something like:

  - Start a session with personal config and a number of loaded packages 
(so that there are a lot of functions defined in obarray)
  - Type 'C-h f'
  - Type 'a', then type 'b'.
  - Delete 'b', type it again, see how quickly you can make the 
completions update.

With ido, the updates seem instant (probably due to some magic in 
ido-completing-read-plus); with fido, there is some lag. Not huge, but 
easy enough to notice.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05  1:39 bug#48841: fido-mode is slower than ido-mode with similar settings Dmitry Gutov
@ 2021-06-05  9:35 ` João Távora
  2021-06-05 23:02   ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-05  9:35 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> I'm comparing
>
>   ido-mode
>   with ido-ubiquitous-mode (for support for arbitrary completion
>   tables), available at
>   https://github.com/DarwinAwardWinner/ido-completing-read-plus
>   with (setq ido-enable-flex-matching t), of course
>
> versus
>
>  fido-mode
>  with
>    (setq icomplete-compute-delay 0)
>    (setq icomplete-show-matches-on-no-input t)
>    (setq icomplete-max-delay-chars 0)
>
> The values chosen for behavior maximally close to ido.
>
> Try something like:
>
>  - Start a session with personal config and a number of loaded
>    packages (so that there are a lot of functions defined in obarray)
>  - Type 'C-h f'
>  - Type 'a', then type 'b'.
>  - Delete 'b', type it again, see how quickly you can make the
>    completions update.
>
> With ido, the updates seem instant (probably due to some magic in
> ido-completing-read-plus); with fido, there is some lag. Not huge, but
> easy enough to notice.

Thanks for the report.  Before I try reproducing, can you try with
fido-vertical-mode and tell us it if that changes anything?  I think I
remember that skipping some suffix-calculation logic saved on a few
traversals of the big list of symbol completions.

João






^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05  9:35 ` João Távora
@ 2021-06-05 23:02   ` Dmitry Gutov
  2021-06-05 23:20     ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-05 23:02 UTC (permalink / raw)
  To: João Távora; +Cc: 48841

On 05.06.2021 12:35, João Távora wrote:
> Thanks for the report.  Before I try reproducing, can you try with
> fido-vertical-mode and tell us it if that changes anything?  I think I
> remember that skipping some suffix-calculation logic saved on a few
> traversals of the big list of symbol completions.

Why, yes it is. fido-vertical-mode is definitely snappier with such 
settings.

Maybe still not on the level of ido-mode, but at least halfway there, 
compared to the "horizontal" version.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05 23:02   ` Dmitry Gutov
@ 2021-06-05 23:20     ` João Távora
  2021-06-05 23:42       ` Dmitry Gutov
                         ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: João Távora @ 2021-06-05 23:20 UTC (permalink / raw)
  To: Dmitry Gutov, monnier; +Cc: 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 05.06.2021 12:35, João Távora wrote:
>> Thanks for the report.  Before I try reproducing, can you try with
>> fido-vertical-mode and tell us it if that changes anything?  I think I
>> remember that skipping some suffix-calculation logic saved on a few
>> traversals of the big list of symbol completions.
>
> Why, yes it is. fido-vertical-mode is definitely snappier with such
> settings.
>
> Maybe still not on the level of ido-mode, but at least halfway there,
> compared to the "horizontal" version.

Yes, that is also my assessment after trying your recipe.  fido-mode +
fido-vertical-mode is not quite as snappy as ido-ubiquitous-mode, but
decently close.

My bet is that the remaining lag is due to sorting.  In a dumb but
illustrative example, when given the pattern 'fmcro' flex-enabled
ido-mode pops 'flymake--backend-state-p--cmacro' to the top, while fido
mode selects the much more reasonable 'defmacro'.

Now, what I called here the "suffix-calculation logic" is what I also
called the "[mplete] dance" back in the emacs-devel thread.  Truth is,
it's always annoyed me in icomplete partially because I don't understand
what it does exactly and how it is supposed to help me.  I suppose
Stefan knows best here.  Regardless of its use, it seems to require
another try-completion call in all the filtered candidates (which might
be very big) so that's probably where the extra lag comes from.

So, in summary, to speed this up for whomever is _not_ using
fido-vertical-mode, either we manage to speed up that part of
icomplete.el, or we get rid of it completely (at least for fido-mode).
For reference, it lives in an "else" branch of one of the "if"s in
icomplete-completions.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05 23:20     ` João Távora
@ 2021-06-05 23:42       ` Dmitry Gutov
  2021-06-06  0:25       ` Dmitry Gutov
  2021-06-06  2:34       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 0 replies; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-05 23:42 UTC (permalink / raw)
  To: João Távora, monnier; +Cc: 48841

On 06.06.2021 02:20, João Távora wrote:
> Now, what I called here the "suffix-calculation logic" is what I also
> called the "[mplete] dance" back in the emacs-devel thread.  Truth is,
> it's always annoyed me in icomplete partially because I don't understand
> what it does exactly and how it is supposed to help me.  I suppose
> Stefan knows best here.  Regardless of its use, it seems to require
> another try-completion call in all the filtered candidates (which might
> be very big) so that's probably where the extra lag comes from.

Shouldn't this logic (whether it's used) be governed by the variable 
icomplete-hide-common-prefix?

Which icomplete--fido-mode-setup sets to nil (appropriately, given than 
ido-mode does not have this behavior). And looking at its behavior, it 
only does the "[mplete] dance" when there is only one match remaining.

Whether to make icomplete-hide-common-prefix affect even the "only one 
match" case is a matter of taste: I could personally find an argument 
for either choice. But we're seeing performance degradation when there 
are many matches, not one.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05 23:20     ` João Távora
  2021-06-05 23:42       ` Dmitry Gutov
@ 2021-06-06  0:25       ` Dmitry Gutov
  2021-06-06  6:54         ` João Távora
  2021-06-06  2:34       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-06  0:25 UTC (permalink / raw)
  To: João Távora, monnier; +Cc: 48841

On 06.06.2021 02:20, João Távora wrote:
> My bet is that the remaining lag is due to sorting.  In a dumb but
> illustrative example, when given the pattern 'fmcro' flex-enabled
> ido-mode pops 'flymake--backend-state-p--cmacro' to the top, while fido
> mode selects the much more reasonable 'defmacro'.

Perhaps not sorting exactly, but the scoring part? Lowering the 
implementation into C might help, we discussed something like that in 
the past.

And/or pick a different algorithm. E.g. Jaro-Winkler, which apparently 
is used in a lot of "fuzzy matching" implementations out there, it's 
pretty fast.

I took an example like

   (setq s (all-completions "" obarray))
   (setq ss (cl-delete-if-not (lambda (s) (string-match-p "a" s)) s))

then

   (benchmark 1 '(completion-all-completions "a" ss nil 1))

prints 0.180s here, whereas a "pure Ruby" implementation of Jaro-Winkler 
takes about 0.060s on the exact same set of strings. But perhaps Ruby is 
just faster than Elisp, I don't have a good comparison.

(The only J-W implementation in Elisp I have found yet -- 
https://github.com/rdiankov/emacs-config/blob/master/.emacs-lisp/auto-complete-1.3.1/fuzzy.el#L70 
-- is slower than the current scoring algo).





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-05 23:20     ` João Távora
  2021-06-05 23:42       ` Dmitry Gutov
  2021-06-06  0:25       ` Dmitry Gutov
@ 2021-06-06  2:34       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-06  6:59         ` João Távora
  2 siblings, 1 reply; 33+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-06  2:34 UTC (permalink / raw)
  To: João Távora; +Cc: 48841, Dmitry Gutov

> Stefan knows best here.  Regardless of its use, it seems to require
> another try-completion call in all the filtered candidates (which might
> be very big) so that's probably where the extra lag comes from.

IIRC the `try-completion` call is performed on the list of possible
completions rather than on the original completion table, so it should
be quite fast.  I'd be surprised if it is a significant portion of the
overall time.


        Stefan






^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06  0:25       ` Dmitry Gutov
@ 2021-06-06  6:54         ` João Távora
  2021-06-06 22:20           ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-06  6:54 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 06.06.2021 02:20, João Távora wrote:
>> My bet is that the remaining lag is due to sorting.  In a dumb but
>> illustrative example, when given the pattern 'fmcro' flex-enabled
>> ido-mode pops 'flymake--backend-state-p--cmacro' to the top, while fido
>> mode selects the much more reasonable 'defmacro'.
>
> Perhaps not sorting exactly, but the scoring part? Lowering the
> implementation into C might help, we discussed something like that in
> the past.

Perhaps, all could be measured.  But I also remember explaining that
scoring is basically free.  The flex algorithm search algorithm is
greedy already, it doesn't backtrack.  Given a pattern 'foo' against
'fabrobazo', it takes 9 steps to see that it matches.  I can't see any
other way to improve that, short of a something like a tottaly different
string implementation.  The scoring's numeric calculations at each step
are trivial.

One way to verify this is to do the scoring, but simply disregard it for
sorting purposes.

> And/or pick a different algorithm. E.g. Jaro-Winkler, which apparently
> is used in a lot of "fuzzy matching" implementations out there, it's
> pretty fast.

That may be useful, but for other purposes.  If I understand correctly,
Jaro-Winkler is for finding the distante between two arbitrary strings,
where the first in not a subsequence of the second.  I bet google uses
stuff like that when you accitendally transpose characters.  Flex just
gives up.  Those other others algos still catch the match (and Google
than probably NL-scours your most intimate fears and checks with your
local dictator before showing you typing suggestions)

> I took an example like
>
>   (setq s (all-completions "" obarray))
>   (setq ss (cl-delete-if-not (lambda (s) (string-match-p "a" s)) s))
>
> then
>
>   (benchmark 1 '(completion-all-completions "a" ss nil 1))
>
> prints 0.180s here, whereas a "pure Ruby" implementation of
> Jaro-Winkler takes about 0.060s on the exact same set of strings. But
> perhaps Ruby is just faster than Elisp, I don't have a good
> comparison.

Go ahead and kill the scoring calculationg altogether in
completion-pcm--hilit-commonality.  I bet it won't make a difference.
If fact, for that experiment, try a simple substring search.  I bet
you're just seeing an inferior GC at work, or a string implementation
that's made optimized for other stuff that Ruby's can't, like
propertization.  Try making Ruby strings that mimic Elips if you've time
to spare...

> (The only J-W implementation in Elisp I have found yet --
> https://github.com/rdiankov/emacs-config/blob/master/.emacs-lisp/auto-complete-1.3.1/fuzzy.el#L70
> -- is slower than the current scoring algo).

There you have it.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06  2:34       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-06  6:59         ` João Távora
  2021-06-06 16:54           ` Dmitry Gutov
  2021-06-06 17:55           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 33+ messages in thread
From: João Távora @ 2021-06-06  6:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 48841, Dmitry Gutov

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Stefan knows best here.  Regardless of its use, it seems to require
>> another try-completion call in all the filtered candidates (which might
>> be very big) so that's probably where the extra lag comes from.
>
> IIRC the `try-completion` call is performed on the list of possible
> completions rather than on the original completion table, so it should
> be quite fast.  I'd be surprised if it is a significant portion of the
> overall time.

Very true, but here's the suprise: In the flex style, there are a _lot_
of "possible completions" for the null or very short patterns.  So those
calculations -- which were more than certainly thought up for prefix-ish
styles -- are quite slow (and also quite useless for flex).  At least
that's my theory.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06  6:59         ` João Távora
@ 2021-06-06 16:54           ` Dmitry Gutov
  2021-06-06 18:37             ` João Távora
  2021-06-06 17:55           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-06 16:54 UTC (permalink / raw)
  To: João Távora, Stefan Monnier; +Cc: 48841

On 06.06.2021 09:59, João Távora wrote:
> Very true, but here's the suprise: In the flex style, there are a_lot_
> of "possible completions" for the null or very short patterns.  So those
> calculations -- which were more than certainly thought up for prefix-ish
> styles -- are quite slow (and also quite useless for flex).  At least
> that's my theory.

try-completion doesn't trigger any completion style machinery; only 
completion-try-completion does.

And are we talking about the 'try-completion' call which is guarded with 
(when icomplete-hide-common-prefix ...)?

icomplete--fido-mode-setup sets that variable to nil.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06  6:59         ` João Távora
  2021-06-06 16:54           ` Dmitry Gutov
@ 2021-06-06 17:55           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-06 21:33             ` João Távora
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-06 17:55 UTC (permalink / raw)
  To: João Távora; +Cc: 48841, Dmitry Gutov

> Very true, but here's the suprise: In the flex style, there are a _lot_
> of "possible completions" for the null or very short patterns.  So those
> calculations -- which were more than certainly thought up for prefix-ish
> styles -- are quite slow (and also quite useless for flex).  At least
> that's my theory.

In the very worst possible case, `try-completion` will be just as slow
as the original computation of the set of possible completions.  So at
most it will double the total time (and this assumes we do basically
nothing else than a single call to `all-completions` to get the set of
candidates and then display them).
In practice I'd be surprised if it ever reaches the 20% mark of the time spent.


        Stefan






^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 16:54           ` Dmitry Gutov
@ 2021-06-06 18:37             ` João Távora
  2021-06-06 22:21               ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-06 18:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Sun, Jun 6, 2021, 17:55 Dmitry Gutov <dgutov@yandex.ru> wrote:

> On 06.06.2021 09:59, João Távora wrote:
> > Very true, but here's the suprise: In the flex style, there are a_lot_
> > of "possible completions" for the null or very short patterns.  So those
> > calculations -- which were more than certainly thought up for prefix-ish
> > styles -- are quite slow (and also quite useless for flex).  At least
> > that's my theory.
>
> try-completion doesn't trigger any completion style machinery; only
> completion-try-completion does.
>

I have no idea if completion style stuff b is related. Just that else
branch is there to calculate some 'determ' thing and a cursory look
revealed try-completion calls being passed 'comps', or 'completions'.
Presumably lots of data given short flex style patterns. No idea what it
accomplishes, as I said.

Bottom line is that something (TM) happened to speed up the whole thing
when I skipped over that whole part. I had vertical mode basically visually
equivalent to vertical, but quite slower.  After skipping that part they
became practically equivalent. And you yourself witnessed this when
switching yo vertical mode, which is when the skip is made.

 I'll check later in the week, away from my computer now.

And are we talking about the 'try-completion' call which is guarded with
> (when icomplete-hide-common-prefix ...)?
>

No idea

>
> icomplete--fido-mode-setup sets that variable to nil.
>

May be.

João

>

[-- Attachment #2: Type: text/html, Size: 2650 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 17:55           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-06 21:33             ` João Távora
  0 siblings, 0 replies; 33+ messages in thread
From: João Távora @ 2021-06-06 21:33 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 48841, Dmitry Gutov

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

On Sun, Jun 6, 2021, 18:55 Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> In practice I'd be surprised if it ever reaches the 20% mark of the time
> spent.


Personally, I'm usually suprised if less than 80% of my estimates aren't
totally off. :) Anyway, if not try-completion like I theorized, it should
be reasonably easy to pinpoint: something non-fido-essential in that else
branch is causing a real slowdown.

João

>
>

[-- Attachment #2: Type: text/html, Size: 973 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06  6:54         ` João Távora
@ 2021-06-06 22:20           ` Dmitry Gutov
  2021-06-06 23:49             ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-06 22:20 UTC (permalink / raw)
  To: João Távora; +Cc: monnier, 48841

On 06.06.2021 09:54, João Távora wrote:

>> Perhaps not sorting exactly, but the scoring part? Lowering the
>> implementation into C might help, we discussed something like that in
>> the past.
> 
> Perhaps, all could be measured.  But I also remember explaining that
> scoring is basically free.  The flex algorithm search algorithm is
> greedy already, it doesn't backtrack.  Given a pattern 'foo' against
> 'fabrobazo', it takes 9 steps to see that it matches.  I can't see any
> other way to improve that, short of a something like a tottaly different
> string implementation.  The scoring's numeric calculations at each step
> are trivial.

Thanks, if it's indeed that fast, I might have a use for it as well, in 
a different context. ;-)

>> And/or pick a different algorithm. E.g. Jaro-Winkler, which apparently
>> is used in a lot of "fuzzy matching" implementations out there, it's
>> pretty fast.
> 
> That may be useful, but for other purposes.  If I understand correctly,
> Jaro-Winkler is for finding the distante between two arbitrary strings,
> where the first in not a subsequence of the second.  I bet google uses
> stuff like that when you accitendally transpose characters.  Flex just
> gives up.  Those other others algos still catch the match (and Google
> than probably NL-scours your most intimate fears and checks with your
> local dictator before showing you typing suggestions)

I'm not crazy about shuffling completion, but some people did indeed ask 
for it. The filtering step has to be more complex, though (you can't 
just construct a straightforward regexp to filter with).

>> I took an example like
>>
>>    (setq s (all-completions "" obarray))
>>    (setq ss (cl-delete-if-not (lambda (s) (string-match-p "a" s)) s))
>>
>> then
>>
>>    (benchmark 1 '(completion-all-completions "a" ss nil 1))
>>
>> prints 0.180s here, whereas a "pure Ruby" implementation of
>> Jaro-Winkler takes about 0.060s on the exact same set of strings. But
>> perhaps Ruby is just faster than Elisp, I don't have a good
>> comparison.
> 
> Go ahead and kill the scoring calculationg altogether in
> completion-pcm--hilit-commonality.  I bet it won't make a difference.
> If fact, for that experiment, try a simple substring search.

Same result, indeed. We should note, though, that 
completion-pcm--hilit-commonality has some steps that were added 
together with the 'flex' style, for it to work.

> I bet
> you're just seeing an inferior GC at work, or a string implementation
> that's made optimized for other stuff that Ruby's can't, like
> propertization.  Try making Ruby strings that mimic Elips if you've time
> to spare...

I did some instrumenting, replacing (completion-pcm--hilit-commonality 
pattern all) inside completion-flex-all-completions with 
(benchmark-progn (completion-pcm--hilit-commonality pattern all)). 
Recompiled between each change (interpreted mode gives very different 
numbers).

Unmodified, the call takes ~85ms:

   Elapsed time: 0.085520s (0.068406s in 4 GCs)

If I comment everything inside its first lambda except the returned 
value (making the function the same as #'identity), the time goes down 
to <1ms.

Uncomment the 'copy-sequence' and 'string-match' calls (which one might 
suspect of garbage generation):

   Elapsed time: 0.006380s

Tried binding gc-cons-threshold to a high value, and even galling 
garbage-collect-maybe (or not): that speeds it up, but adds some 
unpredictable GC pauses later (though it would be nice to be able to 
consolidate the pauses into one collection pass).

Long story short, the patch I just installed, to reuse the match data, 
brings the runtime down to

   Elapsed time: 0.066388s (0.050087s in 3 GCs)

Tried other things like moving the update-score-and-face lambda out of 
the mapcar loop - that didn't move a needle. Someone more familiar with 
the code might get further. But perhaps it's just the cost of executing 
this logic in Lisp 12000 times, and doing some of that in C would be the 
next step.

And a weird part: replacing all repeated (length str) calls with a 
reference to an existing local binding makes it *slower* (back to the 
original performance). Check this out:

diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index d5a0118b7c..d7102245a2 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -3544,7 +3544,7 @@ completion-pcm--hilit-commonality
                      score-numerator   (+ score-numerator (- b a)))
                     (unless (or (= a last-b)
                                 (zerop last-b)
-                               (= a (length str)))
+                               (= a end))
                       (setq
                        score-denominator (+ score-denominator
                                             1
@@ -3562,12 +3562,12 @@ completion-pcm--hilit-commonality
             ;; for that extra bit of match (bug#42149).
             (unless (= from match-end)
               (funcall update-score-and-face from match-end))
-           (if (> (length str) pos)
+           (if (> end pos)
                 (add-face-text-property
                  pos (1+ pos)
                  'completions-first-difference
                  nil str))
-           (unless (zerop (length str))
+           (unless (zerop end)
               (put-text-property
                0 1 'completion-score
                (/ score-numerator (* end (1+ score-denominator)) 1.0) 
str)))
@@ -3980,7 +3980,7 @@ completion-flex-all-completions
                    string table pred point
                    #'completion-flex--make-flex-pattern)))
        (when all
-        (nconc (completion-pcm--hilit-commonality pattern all)
+        (nconc (benchmark-progn (completion-pcm--hilit-commonality 
pattern all))
                 (length prefix))))))

  ;; Initials completion





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 18:37             ` João Távora
@ 2021-06-06 22:21               ` Dmitry Gutov
  2021-06-06 23:27                 ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-06 22:21 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 06.06.2021 21:37, João Távora wrote:
> Bottom line is that something (TM) happened to speed up the whole thing 
> when I skipped over that whole part. I had vertical mode basically 
> visually equivalent to vertical, but quite slower.  After skipping that 
> part they became practically equivalent. And you yourself witnessed this 
> when switching yo vertical mode, which is when the skip is made.

Yep.

>   I'll check later in the week, away from my computer now.

Looking forward to it.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 22:21               ` Dmitry Gutov
@ 2021-06-06 23:27                 ` João Távora
  0 siblings, 0 replies; 33+ messages in thread
From: João Távora @ 2021-06-06 23:27 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Sun, Jun 6, 2021, 23:21 Dmitry Gutov <dgutov@yandex.ru> wrote:

> On 06.06.2021 21:37, João Távora wrote:
> > Bottom line is that something (TM) happened to speed up the whole thing
> > when I skipped over that whole part. I had vertical mode basically
> > visually equivalent to vertical, but quite slower.  After skipping that
> > part they became practically equivalent. And you yourself witnessed this
> > when switching yo vertical mode, which is when the skip is made.
>
> Yep.
>

By the way, earlier I meant to write:

"I had vertical mode basically visually equivalent to vertico.el, but quite
slower.  After skipping that part they became practically equivalent. "

 autocorrect on my phone had other ideas...

João

[-- Attachment #2: Type: text/html, Size: 1637 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 22:20           ` Dmitry Gutov
@ 2021-06-06 23:49             ` João Távora
  2021-06-07  0:11               ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-06 23:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]

On Sun, Jun 6, 2021, 23:20 Dmitry Gutov <dgutov@yandex.ru> wrote:

>
> >> And/or pick a different algorithm. E.g. Jaro-Winkler, which apparently
> >> is used in a lot of "fuzzy matching" implementations out there, it's
> >> pretty fast.
> >
> > That may be useful, but for other purposes.  If I understand correctly,
> > Jaro-Winkler is for finding the distante between two arbitrary strings,
> > where the first in not a subsequence of the second.  I bet google uses
> > stuff like that when you accitendally transpose characters.  Flex just
> > gives up.  Those other others algos still catch the match (and Google
> > than probably NL-scours your most intimate fears and checks with your

> local dictator before showing you typing suggestions)
>

Meant to write ML as in machine learning, not NL.

I'm not crazy about shuffling completion, but some people did indeed ask
> for it. The filtering step has to be more complex, though (you can't
> just construct a straightforward regexp to filter with).
>

I think you calculate the distance using one of these fancy multi-surname
algorithms , then do cutoff somewhere.

Same result, indeed. We should note, though, that
> completion-pcm--hilit-commonality has some steps that were added
> together with the 'flex' style, for it to work.
>

But nothing algorithmically aberrant, I think.

I did some instrumenting, replacing (completion-pcm--hilit-commonality
> pattern all) inside completion-flex-all-completions with
> (benchmark-progn (completion-pcm--hilit-commonality pattern all)).
> Recompiled between each change (interpreted mode gives very different
> numbers).
>
> Unmodified, the call takes ~85ms:
>
>    Elapsed time: 0.085520s (0.068406s in 4 GCs)
>


By the way, I think you should be running benchmarks multiple times to get
times in the seconds range, and reduce noise. Multiple levels of CPU cache
and other factors like temp thottling may skew results when running just
one time.

If I comment everything inside its first lambda except the returned
> value (making the function the same as #'identity), the time goes down
> to <1ms.
>
> Uncomment the 'copy-sequence' and 'string-match' calls (which one might
> suspect of garbage generation):
>
>    Elapsed time: 0.006380s
>
> Tried binding gc-cons-threshold to a high value, and even galling
> garbage-collect-maybe (or not): that speeds it up, but adds some
> unpredictable GC pauses later (though it would be nice to be able to
> consolidate the pauses into one collection pass).
>

Maybe in Elisp that's a good idea, in other lisps and other languages,
second-guessing the GC is a bad idea. I hear ours is so basic that indeed
it might be reasonable.

Long story short, the patch I just installed, to reuse the match data,
> brings the runtime down to
>
>    Elapsed time: 0.066388s (0.050087s in 3 GCs)
>

That's nice! but are you sure you're not seeing noise, too?

Tried other things like moving the update-score-and-face lambda out of
> the mapcar loop - that didn't move a needle.


If a lambda is non capturing of stuff inside the loop, only one copy of it
is ever made, I think. So it doesn't suprise me.

And a weird part: replacing all repeated (length str) calls with a
> reference to an existing local binding makes it *slower* (back to the
> original performance).


Might be noise, or you might be thrashing of CPU caches, who knows? If the
string length is on the same cache line as the contents of the string
you're reading, then evicting that to go read the value of a boxed integer
somewhere radically different is slow. Just speculation of course. Might
just be noise or something else entirely.

João

[-- Attachment #2: Type: text/html, Size: 6158 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-06 23:49             ` João Távora
@ 2021-06-07  0:11               ` Dmitry Gutov
  2021-06-07  8:52                 ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-07  0:11 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 07.06.2021 02:49, João Távora wrote:

>     Same result, indeed. We should note, though, that
>     completion-pcm--hilit-commonality has some steps that were added
>     together with the 'flex' style, for it to work.
> 
> 
> But nothing algorithmically aberrant, I think.

Just some stuff adding work to GC, I think.

> By the way, I think you should be running benchmarks multiple times to 
> get times in the seconds range, and reduce noise. Multiple levels of CPU 
> cache and other factors like temp thottling may skew results when 
> running just one time.

Yeah, I repeat the action with each version for like a few dozen times, 
until I see the numbers stabilize, or just take the average.

>     Tried binding gc-cons-threshold to a high value, and even galling
>     garbage-collect-maybe (or not): that speeds it up, but adds some
>     unpredictable GC pauses later (though it would be nice to be able to
>     consolidate the pauses into one collection pass).
> 
> 
> Maybe in Elisp that's a good idea, in other lisps and other languages, 
> second-guessing the GC is a bad idea. I hear ours is so basic that 
> indeed it might be reasonable.

I never get good results with that.

>     Long story short, the patch I just installed, to reuse the match data,
>     brings the runtime down to
> 
>         Elapsed time: 0.066388s (0.050087s in 3 GCs)
> 
> 
> That's nice! but are you sure you're not seeing noise, too?

Pretty sure.

>     Tried other things like moving the update-score-and-face lambda out of
>     the mapcar loop - that didn't move a needle.
> 
> 
> If a lambda is non capturing of stuff inside the loop, only one copy of 
> it is ever made, I think. So it doesn't suprise me.

update-score-and-face references both variables in its closest binding 
form (score-numerator, score-denominator) and the parameter of its 
containing lambda (str).

Maybe moving all of them to parameters and return values (making it a 
static function and having the caller manage state) would help, I 
haven't tried that exactly.

>     And a weird part: replacing all repeated (length str) calls with a
>     reference to an existing local binding makes it *slower* (back to the
>     original performance).
> 
> 
> Might be noise, or you might be thrashing of CPU caches, who knows? If 
> the string length is on the same cache line as the contents of the 
> string you're reading, then evicting that to go read the value of a 
> boxed integer somewhere radically different is slow.

But the string value is boxed as well, right? So the code needs to 
follow one indirection either way. Perhaps there's also overhead in 
looking up the lexical scope.

I also tried using the new-and-shiny length> and length=. This simply 
made no measurable difference.

> Just speculation of 
> course. Might just be noise or something else entirely.

This is highly reproducible. On my machine, at least.

Given how weird it is, I wouldn't just write about it without 
recompiling, restarting Emacs and measuring the scenario several times, 
with different versions of code.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-07  0:11               ` Dmitry Gutov
@ 2021-06-07  8:52                 ` João Távora
  2021-06-11  2:19                   ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-07  8:52 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

On Mon, Jun 7, 2021, 01:11 Dmitry Gutov <dgutov@yandex.ru> wrote:

> On 07.06.2021 02:49, João Távora wrote:
>
> >     Same result, indeed. We should note, though, that
> >     completion-pcm--hilit-commonality has some steps that were added
> >     together with the 'flex' style, for it to work.
> >
> >
> > But nothing algorithmically aberrant, I think.
>
> Just some stuff adding work to GC, I think.
>

O(n) stuff being a property and a number on each string, small in
comparison to the string.

Maybe moving all of them to parameters and return values (making it a
> static function and having the caller manage state) would help, I
> haven't tried that exactly.
>

Normally, in those adventures you end up with the same allocations
somewhere else, and uglier code. But you can try.

> Might be noise, or you might be thrashing of CPU caches, who knows? If
> > the string length is on the same cache line as the contents of the
> > string you're reading, then evicting that to go read the value of a
> > boxed integer somewhere radically different is slow.
>
> But the string value is boxed as well, right?


The key is locality. If the string length and data happen to live nearby in
memory (in the same box, so to speak), there's a decent chance that reading
one brings the other into the cache, and you get a nice hit depending on
your subsequent operation.

Here I'm just speculating, as I said. In managed languages such as Lisps,
it's somewhat unpredictable. It's also always hardware dependent. Though
given C/C++, a known processor and the right application, this will make a
world of a difference, and will yield truly "weird" results (which arent
weird at all after you understand the logic). Like, for example a vector
being much better at sorted insertion than a linked list. (!) Look it up.
Bjarne Stroustrup has one of those talks.

João

[-- Attachment #2: Type: text/html, Size: 2915 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-07  8:52                 ` João Távora
@ 2021-06-11  2:19                   ` Dmitry Gutov
  2021-06-11 17:09                     ` João Távora
  2021-06-11 23:24                     ` João Távora
  0 siblings, 2 replies; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-11  2:19 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

[-- Attachment #1: Type: text/plain, Size: 3655 bytes --]

On 07.06.2021 11:52, João Távora wrote:

>     Maybe moving all of them to parameters and return values (making it a
>     static function and having the caller manage state) would help, I
>     haven't tried that exactly.
> 
> 
> Normally, in those adventures you end up with the same allocations 
> somewhere else, and uglier code. But you can try.

I have it a try, with little success (patch attached, for posterity). 
Since there's no multiple value returns in Lisp, I had to define a 
container for the three values.

The performance is basically the same, which seems to indicate that 
either Elisp has to allocate very little for a compiled lambda code, or 
it's optimized out (which would also make sense: the only thing 
necessary for it is a new container for the current scope).

>      > Might be noise, or you might be thrashing of CPU caches, who
>     knows? If
>      > the string length is on the same cache line as the contents of the
>      > string you're reading, then evicting that to go read the value of a
>      > boxed integer somewhere radically different is slow.
> 
>     But the string value is boxed as well, right? 
> 
> 
> The key is locality. If the string length and data happen to live nearby 
> in memory (in the same box, so to speak), there's a decent chance that 
> reading one brings the other into the cache, and you get a nice hit 
> depending on your subsequent operation.
> 
> Here I'm just speculating, as I said. In managed languages such as 
> Lisps, it's somewhat unpredictable. It's also always hardware dependent.
> Though given C/C++, a known processor and the right application, this 
> will make a world of a difference, and will yield truly "weird" results 
> (which arent weird at all after you understand the logic). Like, for 
> example a vector being much better at sorted insertion than a linked 
> list. (!) Look it up. Bjarne Stroustrup has one of those talks.

When you have to do some work, better memory locality can indeed change 
a lot. But in this case we have an already computed value vs. something 
the code still needs to compute, however fast that is.

Accessing function arguments must be currently much faster than looking 
up the current scope defined with 'let'.

Anyway, looking at what else could be removed, now that the extra 
allocation in 'match-data' is gone, what really speeds it up 2x-11x 
(depending on whether GC kicks in, but it more often doesn't), is 
commenting out the line:

   (setq str (copy-sequence str))

So if it were possible to rearrange completion-pcm--hilit-commonality 
not to have to modify the strings (probably removing the function 
altogether?), that would improve the potential performance of c-a-p-f 
quite a bit, for fido-mode and other frontends (depending on how much 
overhead the other layers add).

Ultimately, the scoring information doesn't have to live in the text 
properties. For sorting, the frontend could allocate a hash table, then 
ask the [backends? styles?] for completion scores on each item and sort 
based on that. Since faces are needed only for the completions that are 
currently displayed, even having to repeat the regexp matching stuff for 
each of them later would be no big deal performance-wise, compared to 
the current approach.

Anyway, these are musing for the much-discussed future iteration of the 
API. With the current version, and tied by backward compatibility, it 
might be possible to wring 10ms of improvement by consolidating text 
property changes somehow, but likely no more than that.

Looking forward for your analysis of fido-vertical-mode's performance 
improvement over the "normal" one.

[-- Attachment #2: completion-pcm-score-struct.diff --]
[-- Type: text/x-patch, Size: 4525 bytes --]

diff --git a/lisp/minibuffer.el b/lisp/minibuffer.el
index d5a0118b7c..0cb247fc19 100644
--- a/lisp/minibuffer.el
+++ b/lisp/minibuffer.el
@@ -3485,6 +3485,7 @@ completion-pcm--hilit-commonality
     (let* ((re (completion-pcm--pattern->regex pattern 'group))
            (point-idx (completion-pcm--pattern-point-idx pattern))
            (case-fold-search completion-ignore-case)
+           (score (make-vector 3 0))
            last-md)
       (mapcar
        (lambda (str)
@@ -3531,37 +3532,17 @@ completion-pcm--hilit-commonality
                 ;;    (SUM_across_i(hole_i_contrib) + 1) * len
                 ;;
                 ;; , where "len" is the string's length.
-                (score-numerator 0)
-                (score-denominator 0)
-                (last-b 0)
-                (update-score-and-face
-                 (lambda (a b)
-                   "Update score and face given match range (A B)."
-                   (add-face-text-property a b
-                                           'completions-common-part
-                                           nil str)
-                   (setq
-                    score-numerator   (+ score-numerator (- b a)))
-                   (unless (or (= a last-b)
-                               (zerop last-b)
-                               (= a (length str)))
-                     (setq
-                      score-denominator (+ score-denominator
-                                           1
-                                           (expt (- a last-b 1)
-                                                 (/ 1.0
-                                                    flex-score-match-tightness)))))
-                   (setq
-                    last-b              b))))
+                )
+           (fillarray score 0)
            (while md
-             (funcall update-score-and-face from (pop md))
+             (completion-pcm--update-score-and-face str from (pop md) score)
              (setq from (pop md)))
            ;; If `pattern' doesn't have an explicit trailing any, the
            ;; regex `re' won't produce match data representing the
            ;; region after the match.  We need to account to account
            ;; for that extra bit of match (bug#42149).
            (unless (= from match-end)
-             (funcall update-score-and-face from match-end))
+             (completion-pcm--update-score-and-face str from match-end score))
            (if (> (length str) pos)
                (add-face-text-property
                 pos (1+ pos)
@@ -3570,10 +3551,35 @@ completion-pcm--hilit-commonality
            (unless (zerop (length str))
              (put-text-property
               0 1 'completion-score
-              (/ score-numerator (* end (1+ score-denominator)) 1.0) str)))
+              (/ (completion-pcm--score-numerator score)
+                 (* end (1+ (completion-pcm--score-denominator score)))
+                 1.0)
+              str)))
          str)
        completions))))
 
+(cl-defstruct (completion-pcm--score (:type vector))
+  (numerator 0) (denominator 0) (last-b 0))
+
+(defun completion-pcm--update-score-and-face (str a b score)
+  "Update score and face in STR given match range (A B)."
+  (add-face-text-property a b
+                          'completions-common-part
+                          nil str)
+  (let ((last-b (completion-pcm--score-last-b score)))
+    (setf (completion-pcm--score-numerator score)
+          (+ (completion-pcm--score-numerator score) (- b a)))
+    (unless (or (= a last-b)
+                (zerop last-b)
+                (= a (length str)))
+      (setf (completion-pcm--score-denominator score)
+            (+ (completion-pcm--score-denominator score)
+               1
+               (expt (- a last-b 1)
+                     (/ 1.0
+                        flex-score-match-tightness)))))
+    (setf (completion-pcm--score-last-b score) b)))
+
 (defun completion-pcm--find-all-completions (string table pred point
                                                     &optional filter)
   "Find all completions for STRING at POINT in TABLE, satisfying PRED.
@@ -3980,7 +3986,7 @@ completion-flex-all-completions
                   string table pred point
                   #'completion-flex--make-flex-pattern)))
       (when all
-        (nconc (completion-pcm--hilit-commonality pattern all)
+        (nconc (benchmark-progn (completion-pcm--hilit-commonality pattern all))
                (length prefix))))))
 
 ;; Initials completion

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11  2:19                   ` Dmitry Gutov
@ 2021-06-11 17:09                     ` João Távora
  2021-06-11 22:34                       ` Dmitry Gutov
  2021-06-11 23:24                     ` João Távora
  1 sibling, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-11 17:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 07.06.2021 11:52, João Távora wrote:
>
>>     Maybe moving all of them to parameters and return values (making it a
>>     static function and having the caller manage state) would help, I
>>     haven't tried that exactly.
>> Normally, in those adventures you end up with the same allocations
>> somewhere else, and uglier code. But you can try.
>
> I have it a try, with little success (patch attached, for
> posterity). Since there's no multiple value returns in Lisp, I had to
> define a container for the three values.

And if there were multiple value, you can bet the container for them
wouldn't be free ;-)

> The performance is basically the same, which seems to indicate that
> either Elisp has to allocate very little for a compiled lambda code,
> or it's optimized out (which would also make sense: the only thing
> necessary for it is a new container for the current scope).

Which lambda are we talking about?  Is it ` update-score-and-face`?  If
so, I would guess that the capture of `score-denominator` is what takes
space, and that space is no bigger than another variable in that let
scope.

>> Though given C/C++, a known processor and the right application,
>> this will make a world of a difference, and will yield truly "weird"
>> results (which arent weird at all after you understand the
>> logic). Like, for example a vector being much better at sorted
>> insertion than a linked list. (!) Look it up. Bjarne Stroustrup has
>> one of those talks.
> When you have to do some work, better memory locality can indeed
> change a lot. But in this case we have an already computed value
> vs. something the code still needs to compute, however fast that is.

But `length` of a string, in any sane string implementation, _is_
accessing "an already computed value".  Which likely lives just besides
the data.  In Emacs, it seems to be two pointers (8 bytes) apart from
the data.  In a system with 64bytes of L1/2/3 cache it still
theoretically makes up to 52 bytes come in "for free" after you read the
length.  But to be honest I tried a bit and haven't come up with
benchmarks to help confirm -- or help dispel -- this theory.  Maybe you
can distill your "weird" experiment down to a code snippet?

> Accessing function arguments must be currently much faster than
> looking up the current scope defined with 'let'.

In a compiled CL system, I would expect the former to use the stack, and
the to use the heap, but it wouldn't make any difference in reading the
variable's value, I think.  But Elisp is byte-compiled, not natively
compiled (except for that thing now, haven't tried it), and i don't
understand how the byte-compiler chooses byte-codes so all bets are off.

> Anyway, looking at what else could be removed, now that the extra
> allocation in 'match-data' is gone, what really speeds it up 2x-11x
> (depending on whether GC kicks in, but it more often doesn't), is
> commenting out the line:
>
>   (setq str (copy-sequence str))
>
> So if it were possible to rearrange completion-pcm--hilit-commonality
> not to have to modify the strings (probably removing the function
> altogether?), that would improve the potential performance of c-a-p-f
> quite a bit, for fido-mode and other frontends (depending on how much
> overhead the other layers add).

Very interesting.  I don't know what the matter is with modifying the
string itself.  Is it because we want to protect its 'face' property?
Maybe, but what's the harm in chaning it?  Maybe Stefan knows.  Stefan,
are you reading this far?

If we do want to protect the shared 'face' property -- and only 'face'
-- then we could very add some other property about face that the
frontend could read "just in time" before it itself makes a copy of the
string to display to the user.  

This technique appears to be slightly simpler than using the hash-table
indirection you propose (we would need something like that if, for some
reason, we absolutely could not touch the string's property list.)

> Anyway, these are musing for the much-discussed future iteration of
> the API. With the current version, and tied by backward compatibility,

Maybe I'm missing something, but I don't see why my above idea requires
changing _that_ much of the API (a bit would change yes).  It's a matter
of letting frontends opt-out of the current readily-available
face-propertized completions and opt-into a display-time facility that
does this propertization. 

But if the speedup is big, I'd revisit the rationale for requiring those
copies to be performed in the first place.  In my (very brief) testing
it doesn't hurt a bit to remove it.

> Looking forward for your analysis of fido-vertical-mode's performance
> improvement over the "normal" one.

Will take a look now.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11 17:09                     ` João Távora
@ 2021-06-11 22:34                       ` Dmitry Gutov
  2021-06-11 22:41                         ` Dmitry Gutov
  2021-06-13 14:55                         ` João Távora
  0 siblings, 2 replies; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-11 22:34 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 11.06.2021 20:09, João Távora wrote:

>> The performance is basically the same, which seems to indicate that
>> either Elisp has to allocate very little for a compiled lambda code,
>> or it's optimized out (which would also make sense: the only thing
>> necessary for it is a new container for the current scope).
> 
> Which lambda are we talking about?  Is it ` update-score-and-face`?  If
> so, I would guess that the capture of `score-denominator` is what takes
> space, and that space is no bigger than another variable in that let
> scope.

Yes, that one.

> But `length` of a string, in any sane string implementation, _is_
> accessing "an already computed value".  Which likely lives just besides
> the data.  In Emacs, it seems to be two pointers (8 bytes) apart from
> the data.  In a system with 64bytes of L1/2/3 cache it still
> theoretically makes up to 52 bytes come in "for free" after you read the
> length.  But to be honest I tried a bit and haven't come up with
> benchmarks to help confirm -- or help dispel -- this theory.  Maybe you
> can distill your "weird" experiment down to a code snippet?

I've tried reproducing the effect with a small snippet, and failed. :-(

>> Anyway, looking at what else could be removed, now that the extra
>> allocation in 'match-data' is gone, what really speeds it up 2x-11x
>> (depending on whether GC kicks in, but it more often doesn't), is
>> commenting out the line:
>>
>>    (setq str (copy-sequence str))
>>
>> So if it were possible to rearrange completion-pcm--hilit-commonality
>> not to have to modify the strings (probably removing the function
>> altogether?), that would improve the potential performance of c-a-p-f
>> quite a bit, for fido-mode and other frontends (depending on how much
>> overhead the other layers add).
> 
> Very interesting.  I don't know what the matter is with modifying the
> string itself.  Is it because we want to protect its 'face' property?
> Maybe, but what's the harm in chaning it?

I imagine it's just a "correctness" thing. Text properties are part of 
the string's identity. We add text properties, so we make a copy because 
we don't own the original list (it might be saved to some constant and 
also used for, I don't know, IMenu items?)

> If we do want to protect the shared 'face' property -- and only 'face'
> -- then we could very add some other property about face that the
> frontend could read "just in time" before it itself makes a copy of the
> string to display to the user.

Yes, it's an option (though a less elegant one): apply some namespaced 
text properties with the necessary data. And then we'd also be able to 
fontify "just in time".

Do we have any "frozen strings" in Emacs, which absolutely could not be 
modified? Do we plan to?

> This technique appears to be slightly simpler than using the hash-table
> indirection you propose (we would need something like that if, for some
> reason, we absolutely could not touch the string's property list.)

I disagree it's a simpler technique, but it would indeed be a simpler 
change, based on the current implementation.

>> Anyway, these are musing for the much-discussed future iteration of
>> the API. With the current version, and tied by backward compatibility,
> 
> Maybe I'm missing something, but I don't see why my above idea requires
> changing _that_ much of the API (a bit would change yes).  It's a matter
> of letting frontends opt-out of the current readily-available
> face-propertized completions and opt-into a display-time facility that
> does this propertization.

Even your version is a breaking enough change to be a pain, but possibly 
not beneficial enough to bother all consumers with, until we also add 
some more awaited features, I guess.

But I don't mind it myself, and happy to update Company. Either way it's 
a step forward.

> But if the speedup is big, I'd revisit the rationale for requiring those
> copies to be performed in the first place.

With fido-vertical-mode, and with that particular input, it's

   Elapsed time: 0.130773s (0.031547s in 1 GCs)

without copy-sequence, and

   Elapsed time: 0.169842s (0.069740s in 4 GCs)

with it. Not game changing, but definitely measurable.

> In my (very brief) testing
> it doesn't hurt a bit to remove it.

Same. But it likely depends on where the strings came from. In the most 
usual case, of course, they are created at runtime.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11 22:34                       ` Dmitry Gutov
@ 2021-06-11 22:41                         ` Dmitry Gutov
  2021-06-13 14:55                         ` João Távora
  1 sibling, 0 replies; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-11 22:41 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 12.06.2021 01:34, Dmitry Gutov wrote:
> With fido-vertical-mode, and with that particular input, it's
> 
>    Elapsed time: 0.130773s (0.031547s in 1 GCs)
> 
> without copy-sequence, and
> 
>    Elapsed time: 0.169842s (0.069740s in 4 GCs)
> 
> with it. Not game changing, but definitely measurable.

it = icomplete-completions' runtime





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11  2:19                   ` Dmitry Gutov
  2021-06-11 17:09                     ` João Távora
@ 2021-06-11 23:24                     ` João Távora
  2021-06-12  0:43                       ` Dmitry Gutov
  1 sibling, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-11 23:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> Looking forward for your analysis of fido-vertical-mode's performance
> improvement over the "normal" one.

So, I benchmarked before and after this patch to icomplete.el:

    diff --git a/lisp/icomplete.el b/lisp/icomplete.el
    index 08b4ef2030..3561ebfa04 100644
    --- a/lisp/icomplete.el
    +++ b/lisp/icomplete.el
    @@ -858,16 +858,8 @@ icomplete-completions
                    ;; removing making `comps' a proper list.
                    (base-size (prog1 (cdr last)
                                 (if last (setcdr last nil))))
    -               (most-try
    -                (if (and base-size (> base-size 0))
    -                    (completion-try-completion
    -                     name candidates predicate (length name) md)
    -                  ;; If the `comps' are 0-based, the result should be
    -                  ;; the same with `comps'.
    -                  (completion-try-completion
    -                   name comps nil (length name) md)))
    -               (most (if (consp most-try) (car most-try)
    -                       (if most-try (car comps) "")))
    +               (most-try nil)
    +               (most "")
                    ;; Compare name and most, so we can determine if name is
                    ;; a prefix of most, or something else.
                    (compare (compare-strings name nil nil

The patch itself nullifies the calculation of the 'determ' thing that I
and presumably some other users don't value that much.  It doesn't
affect fido-mode's basic funcionality.

How did I benchmark?  Well, to measure the delay the user experiences
until all completions are presented I had to take out the
`while-no-input` in icomplete-exhibit so that this test would work:

    ;; After the form, type C-u C-x C-e C-m in quick succession
    (benchmark-run (completing-read "bla" obarray))

If I don't remove this `while-no-input`, icomplete will not lose time
showing all the completions and will instead select just the first one.
That's a very nice feature for actual use, but for this benchmark that
is not what I want: I want to measure the time to show all the
completions.

Then, the times presented by benchmark-run are the same that the user
sees if he waits to see the completions.  Now the values:

Before my patch:

    (1.802209488 5 1.3678843490000077)
    (1.609066281 4 1.1170432569999775)
    (1.878972079 5 1.3725165670000479)
    (1.901952581 5 1.3979494059999524)
    (1.820800064 5 1.3283940110000003)

After the patch:

    (0.552051921 1 0.3079724459999511)
    (0.58396499 1 0.3038616050000087)
    (0.861106587 2 0.6046198220000178)
    (0.611551175 1 0.30275532399997473)
    (0.62500199 1 0.3160454470000218)

Without the patch but with icomplete-vertical-mode:

    (0.645366711 1 0.3412892389999911)
    (0.6256968110000001 1 0.3234302760000105)
    (0.9716317630000001 2 0.6676939319999633)
    (0.6414442749999999 1 0.3325084230000357)
    (0.627684562 1 0.32241421699995954)

João











^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11 23:24                     ` João Távora
@ 2021-06-12  0:43                       ` Dmitry Gutov
  2021-06-13 14:29                         ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-12  0:43 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 12.06.2021 02:24, João Távora wrote:
> Dmitry Gutov <dgutov@yandex.ru> writes:
> 
>> Looking forward for your analysis of fido-vertical-mode's performance
>> improvement over the "normal" one.
> 
> So, I benchmarked before and after this patch to icomplete.el:
> 
>      diff --git a/lisp/icomplete.el b/lisp/icomplete.el
>      index 08b4ef2030..3561ebfa04 100644
>      --- a/lisp/icomplete.el
>      +++ b/lisp/icomplete.el
>      @@ -858,16 +858,8 @@ icomplete-completions
>                      ;; removing making `comps' a proper list.
>                      (base-size (prog1 (cdr last)
>                                   (if last (setcdr last nil))))
>      -               (most-try
>      -                (if (and base-size (> base-size 0))
>      -                    (completion-try-completion
>      -                     name candidates predicate (length name) md)
>      -                  ;; If the `comps' are 0-based, the result should be
>      -                  ;; the same with `comps'.
>      -                  (completion-try-completion
>      -                   name comps nil (length name) md)))
>      -               (most (if (consp most-try) (car most-try)
>      -                       (if most-try (car comps) "")))
>      +               (most-try nil)
>      +               (most "")
>                      ;; Compare name and most, so we can determine if name is
>                      ;; a prefix of most, or something else.
>                      (compare (compare-strings name nil nil

All right, so this is not about try-completion, it's about 
completion-try-completion. That makes sense.

> The patch itself nullifies the calculation of the 'determ' thing that I
> and presumably some other users don't value that much.  It doesn't
> affect fido-mode's basic funcionality.
> 
> How did I benchmark?  Well, to measure the delay the user experiences
> until all completions are presented I had to take out the
> `while-no-input` in icomplete-exhibit so that this test would work:
> 
>      ;; After the form, type C-u C-x C-e C-m in quick succession
>      (benchmark-run (completing-read "bla" obarray))
> 
> If I don't remove this `while-no-input`, icomplete will not lose time
> showing all the completions and will instead select just the first one.
> That's a very nice feature for actual use, but for this benchmark that
> is not what I want: I want to measure the time to show all the
> completions.

Did the same, can repro.

> Then, the times presented by benchmark-run are the same that the user
> sees if he waits to see the completions.  Now the values:
> 
> Before my patch:
> 
>      (1.802209488 5 1.3678843490000077)
>      (1.609066281 4 1.1170432569999775)
>      (1.878972079 5 1.3725165670000479)
>      (1.901952581 5 1.3979494059999524)
>      (1.820800064 5 1.3283940110000003)
> 
> After the patch:
> 
>      (0.552051921 1 0.3079724459999511)
>      (0.58396499 1 0.3038616050000087)
>      (0.861106587 2 0.6046198220000178)
>      (0.611551175 1 0.30275532399997473)
>      (0.62500199 1 0.3160454470000218)

I get

(0.377195885 10 0.24448539800000013)

before and

(0.245218061 6 0.1390041310000001)

after. A solid improvement.

BTW, if I just stick benchmark-progn around icomplete-completions like

diff --git a/lisp/icomplete.el b/lisp/icomplete.el
index 08b4ef2030..b9fe3e1836 100644
--- a/lisp/icomplete.el
+++ b/lisp/icomplete.el
@@ -678,12 +678,13 @@ icomplete-exhibit
                   ;; seems to trigger it fairly often!
                   (while-no-input-ignore-events '(selection-request))
                   (text (while-no-input
-                         (icomplete-completions
-                          field-string
-                          (icomplete--completion-table)
-                          (icomplete--completion-predicate)
-                          (if (window-minibuffer-p)
-                              (eq minibuffer--require-match t)))))
+                         (benchmark-progn
+                           (icomplete-completions
+                            field-string
+                            (icomplete--completion-table)
+                            (icomplete--completion-predicate)
+                            (if (window-minibuffer-p)
+                                (eq minibuffer--require-match t))))))
                   (buffer-undo-list t)
                   deactivate-mark)
              ;; Do nothing if while-no-input was aborted.

...it reports

Elapsed time: 0.329006s (0.246073s in 10 GCs)

vs

Elapsed time: 0.169200s (0.113762s in 5 GCs)

I suppose the 40-70ms difference is due to delay in typing.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-12  0:43                       ` Dmitry Gutov
@ 2021-06-13 14:29                         ` João Távora
  2021-06-14  0:08                           ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-13 14:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 12.06.2021 02:24, João Távora wrote:
>> Dmitry Gutov <dgutov@yandex.ru> writes:
>> 
>>> Looking forward for your analysis of fido-vertical-mode's performance
>>> improvement over the "normal" one.
>> So, I benchmarked before and after this patch to icomplete.el:
>>      diff --git a/lisp/icomplete.el b/lisp/icomplete.el
>>      index 08b4ef2030..3561ebfa04 100644
>>      --- a/lisp/icomplete.el
>>      +++ b/lisp/icomplete.el
>>      @@ -858,16 +858,8 @@ icomplete-completions
>>                      ;; removing making `comps' a proper list.
>>                      (base-size (prog1 (cdr last)
>>                                   (if last (setcdr last nil))))
>>      -               (most-try
>>      -                (if (and base-size (> base-size 0))
>>      -                    (completion-try-completion
>>      -                     name candidates predicate (length name) md)
>>      -                  ;; If the `comps' are 0-based, the result should be
>>      -                  ;; the same with `comps'.
>>      -                  (completion-try-completion
>>      -                   name comps nil (length name) md)))
>>      -               (most (if (consp most-try) (car most-try)
>>      -                       (if most-try (car comps) "")))
>>      +               (most-try nil)
>>      +               (most "")
>>                      ;; Compare name and most, so we can determine if name is
>>                      ;; a prefix of most, or something else.
>>                      (compare (compare-strings name nil nil
>
> All right, so this is not about try-completion, it's about
> completion-try-completion. That makes sense.

Yeah, to be honest, once I'm done actually using these functions I
immediately evict the differences between try-completion,
completion-try-completion, try-try-completion-completion, or any of
these yoda-speak variations from my mental cache.  

Here I meant is that there was something apparently useless and slow (to
fido-mode at least) going on in that else branch.

> Elapsed time: 0.329006s (0.246073s in 10 GCs)
>
> vs
>
> Elapsed time: 0.169200s (0.113762s in 5 GCs)
>
> I suppose the 40-70ms difference is due to delay in typing.

No idea.  In my (slower?) system, I typed C-u C-x C-e C-m pretty fast.
Presumably the C-m goes in before pp-eval-last-sexp has a chance to read
more input so I wouldn't think it's a delay in typing.  I could
investigate, but since your measurements confirm the same tendency
anyway, I think this simple patch is what's needed to close this issue.

diff --git a/lisp/icomplete.el b/lisp/icomplete.el
index 08b4ef2030..5d37f47e7d 100644
--- a/lisp/icomplete.el
+++ b/lisp/icomplete.el
@@ -859,13 +859,14 @@ icomplete-completions
                (base-size (prog1 (cdr last)
                             (if last (setcdr last nil))))
                (most-try
-                (if (and base-size (> base-size 0))
-                    (completion-try-completion
-                     name candidates predicate (length name) md)
-                  ;; If the `comps' are 0-based, the result should be
-                  ;; the same with `comps'.
-                  (completion-try-completion
-                   name comps nil (length name) md)))
+                (and (not fido-mode) ; Fido avoids these expensive calculations.
+                     (if (and base-size (> base-size 0))
+                         (completion-try-completion
+                          name candidates predicate (length name) md)
+                       ;; If the `comps' are 0-based, the result should be
+                       ;; the same with `comps'.
+                       (completion-try-completion
+                        name comps nil (length name) md))))
                (most (if (consp most-try) (car most-try)
                        (if most-try (car comps) "")))
                ;; Compare name and most, so we can determine if name is

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-11 22:34                       ` Dmitry Gutov
  2021-06-11 22:41                         ` Dmitry Gutov
@ 2021-06-13 14:55                         ` João Távora
  2021-06-17  2:36                           ` Dmitry Gutov
  1 sibling, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-13 14:55 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

>> Very interesting.  I don't know what the matter is with modifying
>> the
>> string itself.  Is it because we want to protect its 'face' property?
>> Maybe, but what's the harm in chaning it?
>
> I imagine it's just a "correctness" thing. Text properties are part of
> the string's identity. We add text properties, so we make a copy
> because we don't own the original list (it might be saved to some
> constant and also used for, I don't know, IMenu items?)

I just confirmed it's not for correctness.  If one foregoes the copy, in
fido-mode a C-h f followed by some input followed by C-g and an M-x and
no input (empty pattern) will show interesting results, i.e. a list of
propertized strings when nothing should be propertized.

But maybe that's a question of removing the propertization when the
pattern is empty?  I'll try later.

>> If we do want to protect the shared 'face' property -- and only 'face'
>> -- then we could very add some other property about face that the
>> frontend could read "just in time" before it itself makes a copy of the
>> string to display to the user.
>
> Yes, it's an option (though a less elegant one): apply some namespaced
> text properties with the necessary data. And then we'd also be able to
> fontify "just in time".
>
> Do we have any "frozen strings" in Emacs, which absolutely could not
> be modified? Do we plan to?

Immutable strings? And error when one tries to?  Or just ignore the
modification?  And how would that help here?

> I disagree it's a simpler technique, but it would indeed be a simpler
> change, based on the current implementation.

simpler means simpler in my book :-)

> But I don't mind it myself, and happy to update Company. Either way
> it's a step forward.

If Company and fido-mode and a couple more outside the core/Elpa are all
that's needed, it's probably warranted.  But there are so many frontends
right now, I don't know...  We'd need some "opt into the optimization",
I think."

>> But if the speedup is big, I'd revisit the rationale for requiring those
>> copies to be performed in the first place.
>
> With fido-vertical-mode, and with that particular input, it's
>
>   Elapsed time: 0.130773s (0.031547s in 1 GCs)
>
> without copy-sequence, and
>
>   Elapsed time: 0.169842s (0.069740s in 4 GCs)
>
> with it. Not game changing, but definitely measurable.

I don't have these results.  I tried the previous experiment:

   ;; C-u C-x C-e C-m in quick succession
   (benchmark-run (completing-read "bla" obarray))

And turned icomplete.el's while-no-input into a progn.

In an Emacs -Q where (length (all-completions "" obarray)) is about
20000.

   ;; with copy
   (0.39647753 6 0.22811240199999983)
   (0.431950471 7 0.263988651)
   (0.451116177 6 0.2249686070000001)
    
   ;; without copy, small but measurable speedup
   (0.29890632 2 0.08419541699999966)
   (0.293501099 2 0.08622194699999985)
   (0.306566633 3 0.0853211100000002)

In a loaded Emacs where (length (all-completions "" obarray)) is 64554

   ;; with copy
   (2.869362171 6 2.3882547280000495)
   (2.909661303 6 2.4209153659999743)
   (2.845522439 6 2.3638140250000106)
    
   ;; without copy.  Huge speedup.
   (0.79817337 1 0.4526993239999797)
   (0.8231736510000001 1 0.4752496449999626)
   (0.719004478 1 0.4016016420000028)

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-13 14:29                         ` João Távora
@ 2021-06-14  0:08                           ` Dmitry Gutov
  2021-06-14  0:16                             ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-14  0:08 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 13.06.2021 17:29, João Távora wrote:

> Yeah, to be honest, once I'm done actually using these functions I
> immediately evict the differences between try-completion,
> completion-try-completion, try-try-completion-completion, or any of
> these yoda-speak variations from my mental cache.
> 
> Here I meant is that there was something apparently useless and slow (to
> fido-mode at least) going on in that else branch.

And there it was!

>> Elapsed time: 0.329006s (0.246073s in 10 GCs)
>>
>> vs
>>
>> Elapsed time: 0.169200s (0.113762s in 5 GCs)
>>
>> I suppose the 40-70ms difference is due to delay in typing.
> 
> No idea.  In my (slower?) system, I typed C-u C-x C-e C-m pretty fast.
> Presumably the C-m goes in before pp-eval-last-sexp has a chance to read
> more input so I wouldn't think it's a delay in typing.

Some input latency must be there, of course it depends on how fast Emacs 
is handling the previous events, how fast the machine is, etc.

Anyway, I was just describing an easier way to benchmark the same effect 
(without having to hit the key sequence very quickly). Hope you or 
someone else find it useful.

> I could
> investigate, but since your measurements confirm the same tendency
> anyway, I think this simple patch is what's needed to close this issue.

Haven't tested it myself, but it looks like it will almost certainly work.

> diff --git a/lisp/icomplete.el b/lisp/icomplete.el
> index 08b4ef2030..5d37f47e7d 100644
> --- a/lisp/icomplete.el
> +++ b/lisp/icomplete.el
> @@ -859,13 +859,14 @@ icomplete-completions
>                  (base-size (prog1 (cdr last)
>                               (if last (setcdr last nil))))
>                  (most-try
> -                (if (and base-size (> base-size 0))
> -                    (completion-try-completion
> -                     name candidates predicate (length name) md)
> -                  ;; If the `comps' are 0-based, the result should be
> -                  ;; the same with `comps'.
> -                  (completion-try-completion
> -                   name comps nil (length name) md)))
> +                (and (not fido-mode) ; Fido avoids these expensive calculations.

Perhaps predicate it on the value of icomplete-hide-common-prefix instead?

fido-mode sets it to nil, and this way we retain a better level of 
abstraction, and better backward compatibility for vanilla 
icomplete-mode users.

Next step might be switching this var's default value to nil.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-14  0:08                           ` Dmitry Gutov
@ 2021-06-14  0:16                             ` João Távora
  2021-06-17  2:23                               ` Dmitry Gutov
  0 siblings, 1 reply; 33+ messages in thread
From: João Távora @ 2021-06-14  0:16 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

On Mon, Jun 14, 2021 at 1:08 AM Dmitry Gutov <dgutov@yandex.ru> wrote:

> Anyway, I was just describing an easier way to benchmark the same effect
> (without having to hit the key sequence very quickly). Hope you or
> someone else find it useful.

Yes.  It is useful.  I didn't know about benchmark-progn.

> > I could
> > investigate, but since your measurements confirm the same tendency
> > anyway, I think this simple patch is what's needed to close this issue.
>
> Haven't tested it myself, but it looks like it will almost certainly work.
>
> > diff --git a/lisp/icomplete.el b/lisp/icomplete.el
> > index 08b4ef2030..5d37f47e7d 100644
> > --- a/lisp/icomplete.el
> > +++ b/lisp/icomplete.el
> > @@ -859,13 +859,14 @@ icomplete-completions
> >                  (base-size (prog1 (cdr last)
> >                               (if last (setcdr last nil))))
> >                  (most-try
> > -                (if (and base-size (> base-size 0))
> > -                    (completion-try-completion
> > -                     name candidates predicate (length name) md)
> > -                  ;; If the `comps' are 0-based, the result should be
> > -                  ;; the same with `comps'.
> > -                  (completion-try-completion
> > -                   name comps nil (length name) md)))
> > +                (and (not fido-mode) ; Fido avoids these expensive calculations.
>
> Perhaps predicate it on the value of icomplete-hide-common-prefix instead?
>
> fido-mode sets it to nil, and this way we retain a better level of
> abstraction, and better backward compatibility for vanilla
> icomplete-mode users.

This is a good idea, the level of abstraction.  But what is this
"common prefix" anyway?  Is it the the same as the "determ"
thing,  or the "[mplete...] dance" as I called it earlier.  Shouldn't
fido-mode then _hide_ it?

I'm confused, but if you're not, go ahead and make that more
abstract change instead of relying on fido-mode.

> Next step might be switching this var's default value to nil.

Also confused, but no objections.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-14  0:16                             ` João Távora
@ 2021-06-17  2:23                               ` Dmitry Gutov
  2021-06-17 21:29                                 ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-17  2:23 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 14.06.2021 03:16, João Távora wrote:
>> Perhaps predicate it on the value of icomplete-hide-common-prefix instead?
>>
>> fido-mode sets it to nil, and this way we retain a better level of
>> abstraction, and better backward compatibility for vanilla
>> icomplete-mode users.
> This is a good idea, the level of abstraction.  But what is this
> "common prefix" anyway?  Is it the the same as the "determ"
> thing,  or the "[mplete...] dance" as I called it earlier.  Shouldn't
> fido-mode then_hide_  it?
> 
> I'm confused, but if you're not, go ahead and make that more
> abstract change instead of relying on fido-mode.

So... it's a bit more complex than that. The 'most' value computes the 
biggest "fuzzy" match (taking into account completion styles) and bases 
the resulting display of the "single match" on that.

Before your patch the output could look like:

   starfshe|(...t-file-process-shell-command) [Matched]

with the patch it's much less informative:

   starfshe| [Matched]

...so it has value, whether the variable I mentioned above is t or nil.

It seems there are two ways to proceed from here:

- Just alter the printing logic in the "single match" case to print the 
match text in full is it's not equal to the input string. I haven't 
puzzled out the logic doing that yet.

- Try to keep the current behavior while avoiding the duplicate work.

About the latter option: the result of that most-try stuff is only 
useful when there is only one match, right? But that work is performed 
unconditionally.

Unless I'm missing something and the value does see some use in the 
multiple-matches situations, the patch below both keeps the current 
behavior and gives the same performance improvement:

diff --git a/lisp/icomplete.el b/lisp/icomplete.el
index 08b4ef2030..fc88e2a3e0 100644
--- a/lisp/icomplete.el
+++ b/lisp/icomplete.el
@@ -859,13 +859,14 @@ icomplete-completions
                 (base-size (prog1 (cdr last)
                              (if last (setcdr last nil))))
                 (most-try
-                (if (and base-size (> base-size 0))
+                (unless (cdr comps)
+                  (if (and base-size (> base-size 0))
+                      (completion-try-completion
+                       name candidates predicate (length name) md)
+                    ;; If the `comps' are 0-based, the result should be
+                    ;; the same with `comps'.
                      (completion-try-completion
-                     name candidates predicate (length name) md)
-                  ;; If the `comps' are 0-based, the result should be
-                  ;; the same with `comps'.
-                  (completion-try-completion
-                   name comps nil (length name) md)))
+                     name comps nil (length name) md))))
                 (most (if (consp most-try) (car most-try)
                         (if most-try (car comps) "")))
                 ;; Compare name and most, so we can determine if name is





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-13 14:55                         ` João Távora
@ 2021-06-17  2:36                           ` Dmitry Gutov
  2021-06-17 21:21                             ` João Távora
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Gutov @ 2021-06-17  2:36 UTC (permalink / raw)
  To: João Távora; +Cc: Stefan Monnier, 48841

On 13.06.2021 17:55, João Távora wrote:

> I just confirmed it's not for correctness.  If one foregoes the copy, in
> fido-mode a C-h f followed by some input followed by C-g and an M-x and
> no input (empty pattern) will show interesting results, i.e. a list of
> propertized strings when nothing should be propertized.

That's also an example of correctness problem, just the more obvious 
kind. It probably happens in a number of situations, e.g. all (?) uses 
of completion-table-with-cache.

> But maybe that's a question of removing the propertization when the
> pattern is empty?  I'll try later.

That must add some performance penalty as well, for extra mutation. And 
wouldn't cover some potential other uses of the same set of strings. In 
IMenu, maybe?

The latter is hypothetical, of course.

>> Do we have any "frozen strings" in Emacs, which absolutely could not
>> be modified? Do we plan to?
> 
> Immutable strings? And error when one tries to?  Or just ignore the
> modification?  And how would that help here?

It wouldn't help. It would do the opposite: basically forbid the 
approach you suggest, mutation without copying.

>> I disagree it's a simpler technique, but it would indeed be a simpler
>> change, based on the current implementation.
> 
> simpler means simpler in my book :-)

One is simpler diff, another is simpler resulting code. Both have their 
upsides.

>> But I don't mind it myself, and happy to update Company. Either way
>> it's a step forward.
> 
> If Company and fido-mode and a couple more outside the core/Elpa are all
> that's needed, it's probably warranted.  But there are so many frontends
> right now, I don't know...  We'd need some "opt into the optimization",
> I think."

Since all other users are third-party (and thus have short release 
cycles), it shouldn't be too much of a problem. Some highlighting code 
would start to fail, but probably without disastrous results. And then 
people will issue updates to look for some new property when the old 
expected ones are all missing.

>>> But if the speedup is big, I'd revisit the rationale for requiring those
>>> copies to be performed in the first place.
>>
>> With fido-vertical-mode, and with that particular input, it's
>>
>>    Elapsed time: 0.130773s (0.031547s in 1 GCs)
>>
>> without copy-sequence, and
>>
>>    Elapsed time: 0.169842s (0.069740s in 4 GCs)
>>
>> with it. Not game changing, but definitely measurable.
> 
> I don't have these results.  I tried the previous experiment:
> 
>     ;; C-u C-x C-e C-m in quick succession
>     (benchmark-run (completing-read "bla" obarray))
> 
> And turned icomplete.el's while-no-input into a progn.
> 
> In an Emacs -Q where (length (all-completions "" obarray)) is about
> 20000.
> 
>     ;; with copy
>     (0.39647753 6 0.22811240199999983)
>     (0.431950471 7 0.263988651)
>     (0.451116177 6 0.2249686070000001)
>      
>     ;; without copy, small but measurable speedup
>     (0.29890632 2 0.08419541699999966)
>     (0.293501099 2 0.08622194699999985)
>     (0.306566633 3 0.0853211100000002)
> 
> In a loaded Emacs where (length (all-completions "" obarray)) is 64554
> 
>     ;; with copy
>     (2.869362171 6 2.3882547280000495)
>     (2.909661303 6 2.4209153659999743)
>     (2.845522439 6 2.3638140250000106)
>      
>     ;; without copy.  Huge speedup.
>     (0.79817337 1 0.4526993239999797)
>     (0.8231736510000001 1 0.4752496449999626)
>     (0.719004478 1 0.4016016420000028)

Even better.

My current session has 37559 symbols, so it's somewhere in the middle.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-17  2:36                           ` Dmitry Gutov
@ 2021-06-17 21:21                             ` João Távora
  0 siblings, 0 replies; 33+ messages in thread
From: João Távora @ 2021-06-17 21:21 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

>>> I disagree it's a simpler technique, but it would indeed be a simpler
>>> change, based on the current implementation.
>> simpler means simpler in my book :-)
>
> One is simpler diff, another is simpler resulting code. Both have
> their upsides.

Oh, you meant the The Big Redesign?  I'm a fan of that too, not only
here but constantly, everywhere...  That indeed means simpler resulting
code in abstract.  Problem is that also means different resulting code
to different people.  But is definitely doable.

>>> But I don't mind it myself, and happy to update Company. Either way
>>> it's a step forward.
>> If Company and fido-mode and a couple more outside the core/Elpa are
>> all
>> that's needed, it's probably warranted.  But there are so many frontends
>> right now, I don't know...  We'd need some "opt into the optimization",
>> I think."
>
> Since all other users are third-party (and thus have short release
> cycles), it shouldn't be too much of a problem. Some highlighting code
> would start to fail, but probably without disastrous results. And then
> people will issue updates to look for some new property when the old
> expected ones are all missing.

OK.  I can live with that rationale.  So what are the places to touch
that "we" control?

- icomplete.el? for fido-mode & friends
- minibuffer.el, for the *Completions* buffer
- company.el
- Any notable others?

>>     ;; with copy
>>     (2.869362171 6 2.3882547280000495)
>>     (2.909661303 6 2.4209153659999743)
>>     (2.845522439 6 2.3638140250000106)
>>          ;; without copy.  Huge speedup.
>>     (0.79817337 1 0.4526993239999797)
>>     (0.8231736510000001 1 0.4752496449999626)
>>     (0.719004478 1 0.4016016420000028)
>
> Even better.
>
> My current session has 37559 symbols, so it's somewhere in the middle.

Yes, this is a big performance bottleneck.  But i wonder if tweaking GC
parameter would help here.  I know nothing of Emacs GC parameters.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#48841: fido-mode is slower than ido-mode with similar settings
  2021-06-17  2:23                               ` Dmitry Gutov
@ 2021-06-17 21:29                                 ` João Távora
  0 siblings, 0 replies; 33+ messages in thread
From: João Távora @ 2021-06-17 21:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, 48841

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 14.06.2021 03:16, João Távora wrote:
>>> Perhaps predicate it on the value of icomplete-hide-common-prefix instead?
>>>
>>> fido-mode sets it to nil, and this way we retain a better level of
>>> abstraction, and better backward compatibility for vanilla
>>> icomplete-mode users.
>> This is a good idea, the level of abstraction.  But what is this
>> "common prefix" anyway?  Is it the the same as the "determ"
>> thing,  or the "[mplete...] dance" as I called it earlier.  Shouldn't
>> fido-mode then_hide_  it?
>> I'm confused, but if you're not, go ahead and make that more
>> abstract change instead of relying on fido-mode.
>
> So... it's a bit more complex than that.

Yes, my batch broke the things you mentioned.

> It seems there are two ways to proceed from here:
>
> - Just alter the printing logic in the "single match" case to print
>   the match text in full is it's not equal to the input string. I
>   haven't puzzled out the logic doing that yet.
>   
> - Try to keep the current behavior while avoiding the duplicate work.

Both sound absolutely fine to me.

> About the latter option: the result of that most-try stuff is only
> useful when there is only one match, right?

No idea, but may be.

> Unless I'm missing something and the value does see some use in the
> multiple-matches situations, the patch below both keeps the current
> behavior and gives the same performance improvement:

That'd be fantastic, but I doubt you'd be keeping the exact same
behaviour.  I never understood it -- that's the thing here -- but I
think that completion-try-completion is doing more stuff when multiple
candidates matched by a pattern happen to share the same prefix or
suffix or something like that.  I might be completely wrong, tho.

But really if you make this patch conditional to fido-mode or that other
var that you think is more abstract, I think it's fine and it's a very
clear win. I really doubt that the tiny number of fido-mode users care
about that behaviour anyway, but I'm sure they'll appreciate the
considerable speedup.

João





^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-06-17 21:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05  1:39 bug#48841: fido-mode is slower than ido-mode with similar settings Dmitry Gutov
2021-06-05  9:35 ` João Távora
2021-06-05 23:02   ` Dmitry Gutov
2021-06-05 23:20     ` João Távora
2021-06-05 23:42       ` Dmitry Gutov
2021-06-06  0:25       ` Dmitry Gutov
2021-06-06  6:54         ` João Távora
2021-06-06 22:20           ` Dmitry Gutov
2021-06-06 23:49             ` João Távora
2021-06-07  0:11               ` Dmitry Gutov
2021-06-07  8:52                 ` João Távora
2021-06-11  2:19                   ` Dmitry Gutov
2021-06-11 17:09                     ` João Távora
2021-06-11 22:34                       ` Dmitry Gutov
2021-06-11 22:41                         ` Dmitry Gutov
2021-06-13 14:55                         ` João Távora
2021-06-17  2:36                           ` Dmitry Gutov
2021-06-17 21:21                             ` João Távora
2021-06-11 23:24                     ` João Távora
2021-06-12  0:43                       ` Dmitry Gutov
2021-06-13 14:29                         ` João Távora
2021-06-14  0:08                           ` Dmitry Gutov
2021-06-14  0:16                             ` João Távora
2021-06-17  2:23                               ` Dmitry Gutov
2021-06-17 21:29                                 ` João Távora
2021-06-06  2:34       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-06  6:59         ` João Távora
2021-06-06 16:54           ` Dmitry Gutov
2021-06-06 18:37             ` João Távora
2021-06-06 22:21               ` Dmitry Gutov
2021-06-06 23:27                 ` João Távora
2021-06-06 17:55           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-06 21:33             ` João Távora

unofficial mirror of bug-gnu-emacs@gnu.org 

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/emacs-bugs/0 emacs-bugs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 emacs-bugs emacs-bugs/ https://yhetil.org/emacs-bugs \
		bug-gnu-emacs@gnu.org
	public-inbox-index emacs-bugs

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.bugs
	nntp://news.gmane.io/gmane.emacs.bugs


code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs.git

AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git