From: Drew Adams <drew.adams@oracle.com>
To: "Daniel Pittman" <slippycheeze@google.com>,
"João Távora" <joaotavora@gmail.com>
Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
emacs-devel <emacs-devel@gnu.org>
Subject: RE: new-flex-completion-style
Date: Thu, 14 Feb 2019 16:34:38 +0000 (UTC) [thread overview]
Message-ID: <1f4513ab-cd39-4543-9b1a-743e1307dd54@default> (raw)
In-Reply-To: <CAC45yQsV5r7ECEGaHpDu6ESqB=qsagkU=VBNKZq1r-9k2fq5eA@mail.gmail.com>
> The Levenshtein distance and close variants generally do
> a reasonable job of approximating human expectations, yes.
> Close variants (eg: only delete/insert, no mutate) also
> work well.
>
> The cost of building the Levenshtein edit distance matrix
> for picking or ordering candidates is quite high, though; ...
> gives a good summary, and you really don't want to use the
> most naive option if you can avoid it. Computation costs
> go through the roof very fast; the cost is roughly
> proportional to the length of the candidate cubed, using the
> brute force approach.
>
> The other shortfall of using only edit distance is that most
> of the time people expect to front-weight the search: you
> may have a closer edit distance match later in the string,
> but people usually want a less exact match, closer to the
> start, intuitively.
>
> > I've been experimenting with a simpler function that just
> > counts the number of "holes" and the length of those holes
> > separately in the denominator. The numerator is the same
> > and a perfect match is still a 1. It seems to fare better
> > for your cases. For foo
>
> I'd very, very strongly encourage y'all to take a look at
> the existing work on the problem, not least because of the
> algorithmic complexity costs when you apply this to a large
> corpus of candidates. Consider, for example, completion
> over the set of all defined symbols in Emacs, which is
> ~ 55,000 candidates, and IIRC I calculated the average
> length to 9 characters, so brute force matching would be
> be on the order of 40 **million** comparisons for the first
> pattern character, though thankfully dropping fast as you
> eliminate non-matching candidates entirely. (Raw comparisons,
> accounting for all possible matches in all possible candidate,
> of that pattern, so each candidate gets
> $length / $pattern_length comparisons performed.)
>
> Anyway, the point is: this is an area where serious study
> has been invested, and I strongly urge you to take advantage
> of that, rather than trying to reinvent this surprisingly
> complex wheel. This isn't simple to do, and it definitely
> isn't simple to do in a high performance way.
>
> PS: whatever else you do, make it trivial to configure what
> function is used for ordering the candidate results. That
> matters far more that the details of the default match /
> order choice, because it allows easy innovation over time.
>
> PPS: the best modern tools seem to be derived from the
> problem of genome sequence matching, but heuristics are
> definitely helpful too. Again, fzf is the best version of
> that I have found.
All good info; thanks. +1 for (1) emphasizing the cost
in general/naive cases and for (2) saying "make it trivial
to configure" the sort order (function).
João's (first) reply to you, saying that an interactive
Emacs context (e.g. completion) has needs that can be a
bit different, is also relevant.
FWIW, here is how Icicles uses (2 kinds of) Levenshtein
matching:
https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#LevenshteinCompletion
And here is how it uses a Jaro-Winkler matching:
https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#JaroWinklerMatchCompletion
next prev parent reply other threads:[~2019-02-14 16:34 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190202232827.27331.87300@vcs0.savannah.gnu.org>
[not found] ` <20190202232828.4AE452159A@vcs0.savannah.gnu.org>
2019-02-06 3:11 ` [Emacs-diffs] scratch/new-flex-completion-style 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness Dmitry Gutov
2019-02-06 10:09 ` João Távora
2019-02-06 18:54 ` Dmitry Gutov
2019-02-06 19:47 ` João Távora
2019-02-12 0:25 ` Dmitry Gutov
2019-02-12 13:19 ` Stefan Monnier
2019-02-12 22:55 ` Dmitry Gutov
2019-02-13 16:00 ` Stefan Monnier
2019-02-14 1:33 ` Dmitry Gutov
2019-02-19 16:10 ` Stefan Monnier
2019-02-24 0:03 ` Dmitry Gutov
2019-02-27 17:12 ` Stefan Monnier
2019-03-11 0:17 ` Dmitry Gutov
2019-03-11 1:15 ` Stefan Monnier
2019-03-11 22:54 ` Dmitry Gutov
2019-03-12 1:10 ` Drew Adams
2019-03-12 22:25 ` Dmitry Gutov
2019-03-12 23:12 ` Drew Adams
2019-03-11 8:47 ` João Távora
2019-03-11 22:57 ` Dmitry Gutov
2019-02-12 17:21 ` João Távora
2019-02-12 23:47 ` Dmitry Gutov
2019-02-11 21:10 ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Stefan Monnier
2019-02-11 22:16 ` new-flex-completion-style João Távora
2019-02-11 23:02 ` new-flex-completion-style Dmitry Gutov
2019-02-11 23:11 ` new-flex-completion-style João Távora
2019-02-12 0:10 ` new-flex-completion-style Dmitry Gutov
2019-02-12 0:16 ` new-flex-completion-style Óscar Fuentes
2019-02-12 22:04 ` new-flex-completion-style João Távora
2019-02-13 0:28 ` new-flex-completion-style Óscar Fuentes
2019-02-13 11:20 ` new-flex-completion-style João Távora
2019-02-13 14:23 ` new-flex-completion-style Óscar Fuentes
2019-02-13 14:38 ` new-flex-completion-style Drew Adams
2019-02-13 15:24 ` new-flex-completion-style Stefan Monnier
2019-02-13 15:33 ` new-flex-completion-style Drew Adams
2019-02-13 15:40 ` new-flex-completion-style Óscar Fuentes
2019-02-13 17:34 ` new-flex-completion-style Daniel Pittman
2019-02-12 14:08 ` new-flex-completion-style Stefan Monnier
2019-02-12 22:17 ` new-flex-completion-style João Távora
2019-02-13 17:29 ` new-flex-completion-style João Távora
2019-02-13 18:54 ` new-flex-completion-style Stefan Monnier
2019-02-13 19:13 ` new-flex-completion-style João Távora
2019-02-14 13:36 ` new-flex-completion-style Stefan Monnier
2019-02-14 13:55 ` new-flex-completion-style João Távora
2019-02-14 14:59 ` new-flex-completion-style João Távora
2019-02-14 15:28 ` new-flex-completion-style Óscar Fuentes
2019-02-14 15:44 ` new-flex-completion-style Drew Adams
2019-02-14 16:21 ` new-flex-completion-style João Távora
2019-02-14 15:35 ` new-flex-completion-style Daniel Pittman
2019-02-14 16:12 ` new-flex-completion-style João Távora
2019-02-14 16:16 ` new-flex-completion-style João Távora
2019-02-14 16:34 ` Drew Adams [this message]
2019-02-14 17:03 ` new-flex-completion-style João Távora
2019-02-14 17:49 ` new-flex-completion-style Drew Adams
2019-02-14 18:30 ` new-flex-completion-style João Távora
2019-02-14 19:20 ` new-flex-completion-style Drew Adams
2019-02-14 20:54 ` new-flex-completion-style João Távora
2019-02-14 22:03 ` new-flex-completion-style Drew Adams
2019-02-14 22:06 ` new-flex-completion-style João Távora
2019-02-14 22:22 ` new-flex-completion-style Stefan Monnier
2019-02-15 0:54 ` new-flex-completion-style Drew Adams
2019-02-15 4:50 ` new-flex-completion-style Stefan Monnier
2019-02-15 5:52 ` new-flex-completion-style Dmitry Gutov
2019-02-15 6:32 ` new-flex-completion-style Drew Adams
2019-02-18 20:46 ` new-flex-completion-style João Távora
2019-02-18 23:35 ` new-flex-completion-style Stefan Monnier
2019-02-19 9:16 ` new-flex-completion-style João Távora
2019-02-19 12:54 ` new-flex-completion-style Stefan Monnier
2019-02-19 13:01 ` new-flex-completion-style João Távora
2019-02-19 13:32 ` new-flex-completion-style Stefan Monnier
2019-02-11 22:57 ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Drew Adams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f4513ab-cd39-4543-9b1a-743e1307dd54@default \
--to=drew.adams@oracle.com \
--cc=emacs-devel@gnu.org \
--cc=joaotavora@gmail.com \
--cc=monnier@iro.umontreal.ca \
--cc=slippycheeze@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).