all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Drew Adams <drew.adams@oracle.com>
To: "Daniel Pittman" <slippycheeze@google.com>,
	"João Távora" <joaotavora@gmail.com>
Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
	emacs-devel <emacs-devel@gnu.org>
Subject: RE: new-flex-completion-style
Date: Thu, 14 Feb 2019 16:34:38 +0000 (UTC)	[thread overview]
Message-ID: <1f4513ab-cd39-4543-9b1a-743e1307dd54@default> (raw)
In-Reply-To: <CAC45yQsV5r7ECEGaHpDu6ESqB=qsagkU=VBNKZq1r-9k2fq5eA@mail.gmail.com>

> The Levenshtein distance and close variants generally do
> a reasonable job of approximating human expectations, yes.
> Close variants (eg: only delete/insert, no mutate) also
> work well.
>
> The cost of building the Levenshtein edit distance matrix
> for picking or ordering candidates is quite high, though; ...
> gives a good summary, and you really don't want to use the
> most naive option if you can avoid it.  Computation costs
> go through the roof very fast; the cost is roughly
> proportional to the length of the candidate cubed, using the
> brute force approach.
>
> The other shortfall of using only edit distance is that most
> of the time people expect to front-weight the search: you
> may have a closer edit distance match later in the string,
> but people usually want a less exact match, closer to the
> start, intuitively.
>
> > I've been experimenting with a simpler function that just
> > counts the number of "holes" and the length of those holes
> > separately in the denominator. The numerator is the same
> > and a perfect match is still a 1. It seems to fare better
> > for your cases. For foo
>
> I'd very, very strongly encourage y'all to take a look at
> the existing work on the problem, not least because of the
> algorithmic complexity costs when you apply this to a large
> corpus of candidates.  Consider, for example, completion
> over the set of all defined symbols in Emacs, which is
> ~ 55,000 candidates, and IIRC I calculated the average
> length to 9 characters, so brute force matching would be
> be on the order of 40 **million** comparisons for the first
> pattern character, though thankfully dropping fast as you
> eliminate non-matching candidates entirely.  (Raw comparisons,
> accounting for all possible matches in all possible candidate,
> of that pattern, so each candidate gets
> $length / $pattern_length comparisons performed.)
>
> Anyway, the point is: this is an area where serious study
> has been invested, and I strongly urge you to take advantage
> of that, rather than trying to reinvent this surprisingly
> complex wheel.  This isn't simple to do, and it definitely
> isn't simple to do in a high performance way.
>
> PS: whatever else you do, make it trivial to configure what
> function is used for ordering the candidate results.  That
> matters far more that the details of the default match /
> order choice, because it allows easy innovation over time.
>
> PPS: the best modern tools seem to be derived from the
> problem of genome sequence matching, but heuristics are
> definitely helpful too.  Again, fzf is the best version of
> that I have found.

All good info; thanks.  +1 for (1) emphasizing the cost
in general/naive cases and for (2) saying "make it trivial
to configure" the sort order (function).

João's (first) reply to you, saying that an interactive
Emacs context (e.g. completion) has needs that can be a
bit different, is also relevant.

FWIW, here is how Icicles uses (2 kinds of) Levenshtein
matching:

https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#LevenshteinCompletion

And here is how it uses a Jaro-Winkler matching:

https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#JaroWinklerMatchCompletion



  parent reply	other threads:[~2019-02-14 16:34 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190202232827.27331.87300@vcs0.savannah.gnu.org>
     [not found] ` <20190202232828.4AE452159A@vcs0.savannah.gnu.org>
2019-02-06  3:11   ` [Emacs-diffs] scratch/new-flex-completion-style 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness Dmitry Gutov
2019-02-06 10:09     ` João Távora
2019-02-06 18:54       ` Dmitry Gutov
2019-02-06 19:47         ` João Távora
2019-02-12  0:25           ` Dmitry Gutov
2019-02-12 13:19             ` Stefan Monnier
2019-02-12 22:55               ` Dmitry Gutov
2019-02-13 16:00                 ` Stefan Monnier
2019-02-14  1:33                   ` Dmitry Gutov
2019-02-19 16:10                     ` Stefan Monnier
2019-02-24  0:03                       ` Dmitry Gutov
2019-02-27 17:12                         ` Stefan Monnier
2019-03-11  0:17                           ` Dmitry Gutov
2019-03-11  1:15                             ` Stefan Monnier
2019-03-11 22:54                               ` Dmitry Gutov
2019-03-12  1:10                                 ` Drew Adams
2019-03-12 22:25                                   ` Dmitry Gutov
2019-03-12 23:12                                     ` Drew Adams
2019-03-11  8:47                             ` João Távora
2019-03-11 22:57                               ` Dmitry Gutov
2019-02-12 17:21             ` João Távora
2019-02-12 23:47               ` Dmitry Gutov
2019-02-11 21:10   ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Stefan Monnier
2019-02-11 22:16     ` new-flex-completion-style João Távora
2019-02-11 23:02       ` new-flex-completion-style Dmitry Gutov
2019-02-11 23:11         ` new-flex-completion-style João Távora
2019-02-12  0:10           ` new-flex-completion-style Dmitry Gutov
2019-02-12  0:16       ` new-flex-completion-style Óscar Fuentes
2019-02-12 22:04         ` new-flex-completion-style João Távora
2019-02-13  0:28           ` new-flex-completion-style Óscar Fuentes
2019-02-13 11:20             ` new-flex-completion-style João Távora
2019-02-13 14:23               ` new-flex-completion-style Óscar Fuentes
2019-02-13 14:38                 ` new-flex-completion-style Drew Adams
2019-02-13 15:24               ` new-flex-completion-style Stefan Monnier
2019-02-13 15:33                 ` new-flex-completion-style Drew Adams
2019-02-13 15:40                 ` new-flex-completion-style Óscar Fuentes
2019-02-13 17:34                   ` new-flex-completion-style Daniel Pittman
2019-02-12 14:08       ` new-flex-completion-style Stefan Monnier
2019-02-12 22:17         ` new-flex-completion-style João Távora
2019-02-13 17:29           ` new-flex-completion-style João Távora
2019-02-13 18:54             ` new-flex-completion-style Stefan Monnier
2019-02-13 19:13               ` new-flex-completion-style João Távora
2019-02-14 13:36                 ` new-flex-completion-style Stefan Monnier
2019-02-14 13:55                   ` new-flex-completion-style João Távora
2019-02-14 14:59                     ` new-flex-completion-style João Távora
2019-02-14 15:28                       ` new-flex-completion-style Óscar Fuentes
2019-02-14 15:44                         ` new-flex-completion-style Drew Adams
2019-02-14 16:21                         ` new-flex-completion-style João Távora
2019-02-14 15:35                       ` new-flex-completion-style Daniel Pittman
2019-02-14 16:12                         ` new-flex-completion-style João Távora
2019-02-14 16:16                           ` new-flex-completion-style João Távora
2019-02-14 16:34                         ` Drew Adams [this message]
2019-02-14 17:03                           ` new-flex-completion-style João Távora
2019-02-14 17:49                             ` new-flex-completion-style Drew Adams
2019-02-14 18:30                               ` new-flex-completion-style João Távora
2019-02-14 19:20                                 ` new-flex-completion-style Drew Adams
2019-02-14 20:54                                   ` new-flex-completion-style João Távora
2019-02-14 22:03                                     ` new-flex-completion-style Drew Adams
2019-02-14 22:06                                       ` new-flex-completion-style João Távora
2019-02-14 22:22                                       ` new-flex-completion-style Stefan Monnier
2019-02-15  0:54                                         ` new-flex-completion-style Drew Adams
2019-02-15  4:50                                           ` new-flex-completion-style Stefan Monnier
2019-02-15  5:52                                             ` new-flex-completion-style Dmitry Gutov
2019-02-15  6:32                                             ` new-flex-completion-style Drew Adams
2019-02-18 20:46         ` new-flex-completion-style João Távora
2019-02-18 23:35           ` new-flex-completion-style Stefan Monnier
2019-02-19  9:16             ` new-flex-completion-style João Távora
2019-02-19 12:54               ` new-flex-completion-style Stefan Monnier
2019-02-19 13:01                 ` new-flex-completion-style João Távora
2019-02-19 13:32                   ` new-flex-completion-style Stefan Monnier
2019-02-11 22:57     ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f4513ab-cd39-4543-9b1a-743e1307dd54@default \
    --to=drew.adams@oracle.com \
    --cc=emacs-devel@gnu.org \
    --cc=joaotavora@gmail.com \
    --cc=monnier@iro.umontreal.ca \
    --cc=slippycheeze@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.