From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Jo=C3=A3o_?= =?UTF-8?Q?T=C3=A1vora?= Newsgroups: gmane.emacs.bugs Subject: bug#48841: fido-mode is slower than ido-mode with similar settings Date: Mon, 7 Jun 2021 00:49:51 +0100 Message-ID: References: <87eedgy7pt.fsf@gmail.com> <1f659c88-4d9d-8fc9-733a-5e6068f9ed4a@yandex.ru> <87a6o3x5j7.fsf@gmail.com> <87y2bnv5xc.fsf@gmail.com> <35be6652-9c8d-ee21-e9eb-9598ad6777eb@yandex.ru> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000007782bf05c421955f" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3608"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Stefan Monnier , 48841@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jun 07 03:37:09 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lq4CS-0000fN-TY for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 07 Jun 2021 03:37:09 +0200 Original-Received: from localhost ([::1]:34274 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lq4CR-0005nd-On for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 06 Jun 2021 21:37:07 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54068) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lq4CM-0005nV-JM for bug-gnu-emacs@gnu.org; Sun, 06 Jun 2021 21:37:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:42633) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lq4CM-0002fb-D7 for bug-gnu-emacs@gnu.org; Sun, 06 Jun 2021 21:37:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lq4CM-0008Gl-Ab for bug-gnu-emacs@gnu.org; Sun, 06 Jun 2021 21:37:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?Jo=C3=A3o_?= =?UTF-8?Q?T=C3=A1vora?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 07 Jun 2021 01:37:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48841 X-GNU-PR-Package: emacs Original-Received: via spool by 48841-submit@debbugs.gnu.org id=B48841.162302981431774 (code B ref 48841); Mon, 07 Jun 2021 01:37:02 +0000 Original-Received: (at 48841) by debbugs.gnu.org; 7 Jun 2021 01:36:54 +0000 Original-Received: from localhost ([127.0.0.1]:54179 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lq4CD-0008GP-RT for submit@debbugs.gnu.org; Sun, 06 Jun 2021 21:36:54 -0400 Original-Received: from mail-pg1-f174.google.com ([209.85.215.174]:42786) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lq4CB-0008GA-Rj for 48841@debbugs.gnu.org; Sun, 06 Jun 2021 21:36:53 -0400 Original-Received: by mail-pg1-f174.google.com with SMTP id i34so6187912pgl.9 for <48841@debbugs.gnu.org>; Sun, 06 Jun 2021 18:36:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MSh0ek3MqGWUtKtgZXaXrsqHaTuof9mIPeyiqOMQaDM=; b=H2RI+FkBP5d2J6WaIFfcrdqVdz9GuBOcvI0mIrpSEN4jXhavZvHfQAwsAcxXK/8sZI Dd1Co1qf6WDSQ6hlaeHWinW/+757acMPP716K3PWHYTFI6KmGgtlsb3OLRyzFLhoVO51 o0GPZgWgheF9gS74AGVqjCVCUR8Q7IRGR+ulCiwDmJs9UBp6LDpco71zUSWgduj1Gfz2 h04L5N/ZOZFmIIqpzRhqiv0gUcXV5ybMgSdaeX2Fa/B64pvHBL/aV/TvmAo6VVNY6ZAd JJM+cJjJGDvaDBp+MMeS3EakeC1QekUdb8npVHfy+uHgW/T3cEjX+jJca6cUNRg4lJtR iC4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MSh0ek3MqGWUtKtgZXaXrsqHaTuof9mIPeyiqOMQaDM=; b=U9u89oO8uHIScnEeEeTy/0rs6OQ0AkcK6vJMJnqga25WQ9WHaG3pzHll5EbkdmYA30 0BPlPX8xySsdr1TRKeWrvnjn8lnwv3llaK/bDLLHUSY0rVpDo5t7aN3SXiANwyrT8WY7 z7HVizOGsgmYGCDcyq+SZAKGc1jRqjwf8Bw084le7VmZ2/7D/yzBlqebWu1qOI2ya+Ie a8YjesDrQ4BKPrhHEV4aaUX+QfhY5PzK3AgMeX+qD0kFyfO0T41vxDFjmdshT182hNtf VQpHIUS2roBARXphCY7dejja18ZeiskKbttR2S6G1tJWixT5XWckj/BJzmz0z9di45yy relQ== X-Gm-Message-State: AOAM532vWCgZewEC08cvOauKONM+B+gKS1oCuon0jMHbEjKgDGSi0p7o 4at6Ev/V90fe+DiGmU746vkxFEq2jT/ahyt66nWeNUuq X-Google-Smtp-Source: ABdhPJx+lBqSlHKaTg5m+Qel20kVW8wHY9Sb+rMeDFS9DhRHWsr9OssDBkZpnw13O64aLxprDtqHudYlCdingqlVW9g= X-Received: by 2002:a17:902:ed8f:b029:110:a434:4d76 with SMTP id e15-20020a170902ed8fb0290110a4344d76mr9009369plj.52.1623023404154; Sun, 06 Jun 2021 16:50:04 -0700 (PDT) In-Reply-To: <35be6652-9c8d-ee21-e9eb-9598ad6777eb@yandex.ru> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:208168 Archived-At: --0000000000007782bf05c421955f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jun 6, 2021, 23:20 Dmitry Gutov wrote: > > >> And/or pick a different algorithm. E.g. Jaro-Winkler, which apparently > >> is used in a lot of "fuzzy matching" implementations out there, it's > >> pretty fast. > > > > That may be useful, but for other purposes. If I understand correctly, > > Jaro-Winkler is for finding the distante between two arbitrary strings, > > where the first in not a subsequence of the second. I bet google uses > > stuff like that when you accitendally transpose characters. Flex just > > gives up. Those other others algos still catch the match (and Google > > than probably NL-scours your most intimate fears and checks with your > local dictator before showing you typing suggestions) > Meant to write ML as in machine learning, not NL. I'm not crazy about shuffling completion, but some people did indeed ask > for it. The filtering step has to be more complex, though (you can't > just construct a straightforward regexp to filter with). > I think you calculate the distance using one of these fancy multi-surname algorithms , then do cutoff somewhere. Same result, indeed. We should note, though, that > completion-pcm--hilit-commonality has some steps that were added > together with the 'flex' style, for it to work. > But nothing algorithmically aberrant, I think. I did some instrumenting, replacing (completion-pcm--hilit-commonality > pattern all) inside completion-flex-all-completions with > (benchmark-progn (completion-pcm--hilit-commonality pattern all)). > Recompiled between each change (interpreted mode gives very different > numbers). > > Unmodified, the call takes ~85ms: > > Elapsed time: 0.085520s (0.068406s in 4 GCs) > By the way, I think you should be running benchmarks multiple times to get times in the seconds range, and reduce noise. Multiple levels of CPU cache and other factors like temp thottling may skew results when running just one time. If I comment everything inside its first lambda except the returned > value (making the function the same as #'identity), the time goes down > to <1ms. > > Uncomment the 'copy-sequence' and 'string-match' calls (which one might > suspect of garbage generation): > > Elapsed time: 0.006380s > > Tried binding gc-cons-threshold to a high value, and even galling > garbage-collect-maybe (or not): that speeds it up, but adds some > unpredictable GC pauses later (though it would be nice to be able to > consolidate the pauses into one collection pass). > Maybe in Elisp that's a good idea, in other lisps and other languages, second-guessing the GC is a bad idea. I hear ours is so basic that indeed it might be reasonable. Long story short, the patch I just installed, to reuse the match data, > brings the runtime down to > > Elapsed time: 0.066388s (0.050087s in 3 GCs) > That's nice! but are you sure you're not seeing noise, too? Tried other things like moving the update-score-and-face lambda out of > the mapcar loop - that didn't move a needle. If a lambda is non capturing of stuff inside the loop, only one copy of it is ever made, I think. So it doesn't suprise me. And a weird part: replacing all repeated (length str) calls with a > reference to an existing local binding makes it *slower* (back to the > original performance). Might be noise, or you might be thrashing of CPU caches, who knows? If the string length is on the same cache line as the contents of the string you're reading, then evicting that to go read the value of a boxed integer somewhere radically different is slow. Just speculation of course. Might just be noise or something else entirely. Jo=C3=A3o --0000000000007782bf05c421955f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Jun 6, 2021, 23:20 Dmitry Gutov <dgutov@yandex.ru> wrote:

>> And/or pick a different algorithm. E.g. Jaro-Winkler, which appare= ntly
>> is used in a lot of "fuzzy matching" implementations out= there, it's
>> pretty fast.
>
> That may be useful, but for other purposes.=C2=A0 If I understand corr= ectly,
> Jaro-Winkler is for finding the distante between two arbitrary strings= ,
> where the first in not a subsequence of the second.=C2=A0 I bet google= uses
> stuff like that when you accitendally transpose characters.=C2=A0 Flex= just
> gives up.=C2=A0 Those other others algos still catch the match (and Go= ogle
> than probably NL-scours your most intimate fears and checks with your<= /blockquote>
> local dictator before showing you typing suggestions)
=

Meant to write ML= as in machine learning, not NL.

I'm not crazy about shuffling completion, but some people did indeed as= k
for it. The filtering step has to be more complex, though (you can't just construct a straightforward regexp to filter with).

I think you calcula= te the distance using one of these fancy multi-surname algorithms , then do= cutoff somewhere.

Same result, indeed. We should note, though, that
completion-pcm--hilit-commonality has some steps that were added
together with the 'flex' style, for it to work.

But nothing algorith= mically aberrant, I think.

I did some instrumenting, replacing (completion-pcm--hilit-commonality
pattern all) inside completion-flex-all-completions with
(benchmark-progn (completion-pcm--hilit-commonality pattern all)).
Recompiled between each change (interpreted mode gives very different
numbers).

Unmodified, the call takes ~85ms:

=C2=A0 =C2=A0Elapsed time: 0.085520s (0.068406s in 4 GCs)
<= /div>


By the way, I think you should be running benchmarks multiple tim= es to get times in the seconds range, and reduce noise. Multiple levels of = CPU cache and other factors like temp thottling may skew results when runni= ng just one time.

If I comment everything inside its first lambda except the returned
value (making the function the same as #'identity), the time goes down =
to <1ms.

Uncomment the 'copy-sequence' and 'string-match' calls (whi= ch one might
suspect of garbage generation):

=C2=A0 =C2=A0Elapsed time: 0.006380s

Tried binding gc-cons-threshold to a high value, and even galling
garbage-collect-maybe (or not): that speeds it up, but adds some
unpredictable GC pauses later (though it would be nice to be able to
consolidate the pauses into one collection pass).

Maybe in Elisp that's = a good idea, in other lisps and other languages, second-guessing the GC is = a bad idea. I hear ours is so basic that indeed it might be reasonable.

=
Long story short, the patch I just installed, to reuse the match data,
brings the runtime down to

=C2=A0 =C2=A0Elapsed time: 0.066388s (0.050087s in 3 GCs)
<= /div>

That's nice! b= ut are you sure you're not seeing noise, too?
Tried other things like moving the update-score-and-face lambda out of
the mapcar loop - that didn't move a needle.

If a lambda is non capturing of= stuff inside the loop, only one copy of it is ever made, I think. So it do= esn't suprise me.

And a weird part: replacing all repeated (length str) calls with a
reference to an existing local binding makes it *slower* (back to the
original performance).

=
Might be noise, or you might be thrashing of CPU caches, = who knows? If the string length is on the same cache line as the contents o= f the string you're reading, then evicting that to go read the value of= a boxed integer somewhere radically different is slow. Just speculation of= course. Might just be noise or something else entirely.

Jo=C3=A3o
--0000000000007782bf05c421955f--