Hi all!

Time flies, doesn't it?

On 14/08/2021 10:16, Eli Zaretskii wrote:
>>> If one removes these lines, the process becomes much faster, but there is a
>>> problem with highlighting.  My idea is indeed to defer highlighting by not
>>> setting the 'face property directly on that shared string, but some
>>> other property
>>> that is read later from the shared string by compliant frontents.
>>
>> I haven't done any direct benchmarking, but I'm pretty sure that this
>> approach cannot, by definition, be as fast as the non-mutating one.
> 
> Daniel seems to think otherwise, AFAIU.
> 
>> Because you go through the whole list and mutate all of its elements,
>> you have to perform a certain bit of work W x N times, where N is the
>> size of the whole list.
>>
>> Whereas the deferred-mutation approach will only have to do its bit
>> (which is likely more work, like, WWW) only 20 times, or 100 times, or
>> however many completions are displayed. And this is usually negligible.
>>
>> However big the difference is going to be, I can't say in advance, of
>> course, or whether it's going to be shadowed by some other performance
>> costs. But the non-mutating approach should have the best optimization
>> potential when the list is long.
> 
> So I guess the suggestion to have a benchmark is still useful, because
> the estimations of which approach has better performance vary between
> you three.  So maybe producing such benchmarks would be a good step?

To cross this out from my TODO, I spent most of the day rebasing both of 
the proposed patches (one of them longer than the other) -- one from an 
attachment here and another from a commit inside the 
scratch/icomplete-lazy-highlight-attempt-2 branch, porting icomplete to 
Daniel's new completion-filter-completions API (*), and benchmarking.

(*) Included in the attached patch: it needed changing just two lines 
inside icomplete, but also new variable completion-all-sorted-highlight 
and updates to completion--cache-all-sorted-completions and 
completion-all-sorted-completions.

Both rebased patches are attached to this email for your convenience.

AFAICT, the results confirmed my expectations quoted above.

Using Joao's benchmark, with setup:

   (defmacro lotsoffunctions ()
     `(progn
        ,@(cl-loop repeat 150000
                   collect `(defun ,(intern (symbol-name (gensym 
"heyhey"))) () 42))))

   (lotsoffunctions)

I ran the comparisons for empty and non-empty inputs.

With no characters typed:

   (benchmark-run 1
     (let ((completion-styles '(flex))
           (completion-lazy-hilit (cl-gensym)) ; might not be defined
           )
       ;; Uncomment one of the lines below, depending on the patch used.
       ;; (completion-all-completions "" obarray 'fboundp 0 nil)
       ;; (completion-filter-completions "" obarray 'fboundp 0 nil)
       ))

master => 0.066
lazy-hilit => 0.045
filter-and-defer => 0.041 (but more often ~0.110 including GC, somehow)

With one character typed:

   (benchmark-run 1
     (let ((completion-styles '(flex))
           (completion-lazy-hilit (cl-gensym)) ; might not be defined
           )
       ;; Uncomment one of the lines below, depending on the patch used.
       ;; (completion-all-completions "h" obarray 'fboundp 1 nil)
       ;; (completion-filter-completions "h" obarray 'fboundp 1 nil)
       ))

master => 0.824
lazy-hilit => 0.395
filter-and-defer => 0.125 (!)

This more or less translates into the improvement in speed of 
fido-vertical-mode, according to my benchmark-progn call inside 
icomplete-exhibit (included in both attached patches for convenience). 
For non-empty inputs (h or hh or hhe, to match the generated functions), 
filter-and-defer is about 1.5x faster than lazy-hilit, like 0.450ms vs 
0.640ms.

lazy-hilit is slightly faster than filter-and-defer with the empty input 
(like 380ms vs 420ms), and I'm not yet sure why, but it's the scenario 
with 0 highlighting (and so no flex scoring/sorting). Perhaps some 
short-circuiting can be added somewhere to reach parity, or it's the 
cost of extra branching somewhere for backward compatibility (which 
could be removed in the future). Worth additional study.