unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Daniel Pittman <slippycheeze@google.com>
To: "João Távora" <joaotavora@gmail.com>
Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
	emacs-devel <emacs-devel@gnu.org>
Subject: Re: new-flex-completion-style
Date: Thu, 14 Feb 2019 10:35:49 -0500	[thread overview]
Message-ID: <CAC45yQsV5r7ECEGaHpDu6ESqB=qsagkU=VBNKZq1r-9k2fq5eA@mail.gmail.com> (raw)
In-Reply-To: <CALDnm53vb-5W64mvB=nxs=a4Y_j6hu275piwUoTUxSc5LLieNQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3699 bytes --]

On Thu, Feb 14, 2019 at 10:16 AM João Távora <joaotavora@gmail.com> wrote:

> On Thu, Feb 14, 2019 at 1:55 PM João Távora <joaotavora@gmail.com> wrote:
> > On Thu, Feb 14, 2019 at 1:36 PM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
> > >
> > > > I don't think it's odd.  I call the former "scattered match"
> > > > and the latter a "tighter match".
> > >
> > > Don't you find it odd that "foo" gives a better score to
> > > "fotttttttttttttttto" than to "foto" ?
> >
> > Perhaps.  But as it stands it doesn't make sense to
> > compare scores across different length strings.
> >
> > It does make sense to compare the score between
> > foo's matches of foot and foto.
> >
> > > [ Given them the same score sounds acceptable, tho.  ]
> >
> > Feel free to add some kind of normalization to the function
> > if you can come up with it.  Or come up with another function
> > entirely.  As I said, a function that measures the distance
> > in "boring editing operations" between the pattern and the target
> > would be a nicer measure.
>

The Levenshtein distance and close variants generally do a reasonable job
of approximating human expectations, yes.  Close variants (eg: only
delete/insert, no mutate) also work well.

The cost of building the Levenshtein edit distance matrix for picking or
ordering candidates is quite high, though;
https://en.wikipedia.org/wiki/Approximate_string_matching gives a good
summary, and you really don't want to use the most naive option if you can
avoid it.  Computation costs go through the roof very fast; the cost is
roughly proportional to the length of the candidate cubed, using the brute
force approach.

The other shortfall of using only edit distance is that most of the time
people expect to front-weight the search: you may have a closer edit
distance match later in the string, but people usually want a less exact
match, closer to the start, intuitively.


> I've been experimenting with a simpler function that just counts the
> number of "holes" and the length of those holes separately
> in the denominator. The numerator is the same and a perfect
> match is still a 1. It seems to fare better for your cases. For foo
>

I'd very, very strongly encourage y'all to take a look at the existing work
on the problem, not least because of the algorithmic complexity costs when
you apply this to a large corpus of candidates.  Consider, for example,
completion over the set of all defined symbols in Emacs, which is ~ 55,000
candidates, and IIRC I calculated the average length to 9 characters, so
brute force matching would be be on the order of 40 **million** comparisons
for the first pattern character, though thankfully dropping fast as you
eliminate non-matching candidates entirely.  (Raw comparisons, accounting
for all possible matches in all possible candidate, of that pattern, so
each candidate gets $length / $pattern_length comparisons performed.)

Anyway, the point is: this is an area where serious study has been
invested, and I strongly urge you to take advantage of that, rather than
trying to reinvent this surprisingly complex wheel.  This isn't simple to
do, and it definitely isn't simple to do in a high performance way.

PS: whatever else you do, make it trivial to configure what function is
used for ordering the candidate results.  That matters far more that the
details of the default match / order choice, because it allows easy
innovation over time.

PPS: the best modern tools seem to be derived from the problem of genome
sequence matching, but heuristics are definitely helpful too.  Again, fzf
is the best version of that I have found.

[-- Attachment #2: Type: text/html, Size: 4750 bytes --]

  parent reply	other threads:[~2019-02-14 15:35 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190202232827.27331.87300@vcs0.savannah.gnu.org>
     [not found] ` <20190202232828.4AE452159A@vcs0.savannah.gnu.org>
2019-02-06  3:11   ` [Emacs-diffs] scratch/new-flex-completion-style 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness Dmitry Gutov
2019-02-06 10:09     ` João Távora
2019-02-06 18:54       ` Dmitry Gutov
2019-02-06 19:47         ` João Távora
2019-02-12  0:25           ` Dmitry Gutov
2019-02-12 13:19             ` Stefan Monnier
2019-02-12 22:55               ` Dmitry Gutov
2019-02-13 16:00                 ` Stefan Monnier
2019-02-14  1:33                   ` Dmitry Gutov
2019-02-19 16:10                     ` Stefan Monnier
2019-02-24  0:03                       ` Dmitry Gutov
2019-02-27 17:12                         ` Stefan Monnier
2019-03-11  0:17                           ` Dmitry Gutov
2019-03-11  1:15                             ` Stefan Monnier
2019-03-11 22:54                               ` Dmitry Gutov
2019-03-12  1:10                                 ` Drew Adams
2019-03-12 22:25                                   ` Dmitry Gutov
2019-03-12 23:12                                     ` Drew Adams
2019-03-11  8:47                             ` João Távora
2019-03-11 22:57                               ` Dmitry Gutov
2019-02-12 17:21             ` João Távora
2019-02-12 23:47               ` Dmitry Gutov
2019-02-11 21:10   ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Stefan Monnier
2019-02-11 22:16     ` new-flex-completion-style João Távora
2019-02-11 23:02       ` new-flex-completion-style Dmitry Gutov
2019-02-11 23:11         ` new-flex-completion-style João Távora
2019-02-12  0:10           ` new-flex-completion-style Dmitry Gutov
2019-02-12  0:16       ` new-flex-completion-style Óscar Fuentes
2019-02-12 22:04         ` new-flex-completion-style João Távora
2019-02-13  0:28           ` new-flex-completion-style Óscar Fuentes
2019-02-13 11:20             ` new-flex-completion-style João Távora
2019-02-13 14:23               ` new-flex-completion-style Óscar Fuentes
2019-02-13 14:38                 ` new-flex-completion-style Drew Adams
2019-02-13 15:24               ` new-flex-completion-style Stefan Monnier
2019-02-13 15:33                 ` new-flex-completion-style Drew Adams
2019-02-13 15:40                 ` new-flex-completion-style Óscar Fuentes
2019-02-13 17:34                   ` new-flex-completion-style Daniel Pittman
2019-02-12 14:08       ` new-flex-completion-style Stefan Monnier
2019-02-12 22:17         ` new-flex-completion-style João Távora
2019-02-13 17:29           ` new-flex-completion-style João Távora
2019-02-13 18:54             ` new-flex-completion-style Stefan Monnier
2019-02-13 19:13               ` new-flex-completion-style João Távora
2019-02-14 13:36                 ` new-flex-completion-style Stefan Monnier
2019-02-14 13:55                   ` new-flex-completion-style João Távora
2019-02-14 14:59                     ` new-flex-completion-style João Távora
2019-02-14 15:28                       ` new-flex-completion-style Óscar Fuentes
2019-02-14 15:44                         ` new-flex-completion-style Drew Adams
2019-02-14 16:21                         ` new-flex-completion-style João Távora
2019-02-14 15:35                       ` Daniel Pittman [this message]
2019-02-14 16:12                         ` new-flex-completion-style João Távora
2019-02-14 16:16                           ` new-flex-completion-style João Távora
2019-02-14 16:34                         ` new-flex-completion-style Drew Adams
2019-02-14 17:03                           ` new-flex-completion-style João Távora
2019-02-14 17:49                             ` new-flex-completion-style Drew Adams
2019-02-14 18:30                               ` new-flex-completion-style João Távora
2019-02-14 19:20                                 ` new-flex-completion-style Drew Adams
2019-02-14 20:54                                   ` new-flex-completion-style João Távora
2019-02-14 22:03                                     ` new-flex-completion-style Drew Adams
2019-02-14 22:06                                       ` new-flex-completion-style João Távora
2019-02-14 22:22                                       ` new-flex-completion-style Stefan Monnier
2019-02-15  0:54                                         ` new-flex-completion-style Drew Adams
2019-02-15  4:50                                           ` new-flex-completion-style Stefan Monnier
2019-02-15  5:52                                             ` new-flex-completion-style Dmitry Gutov
2019-02-15  6:32                                             ` new-flex-completion-style Drew Adams
2019-02-18 20:46         ` new-flex-completion-style João Távora
2019-02-18 23:35           ` new-flex-completion-style Stefan Monnier
2019-02-19  9:16             ` new-flex-completion-style João Távora
2019-02-19 12:54               ` new-flex-completion-style Stefan Monnier
2019-02-19 13:01                 ` new-flex-completion-style João Távora
2019-02-19 13:32                   ` new-flex-completion-style Stefan Monnier
2019-02-11 22:57     ` new-flex-completion-style (was: [Emacs-diffs] scratch/ 2c75775 2/2: Score, sort and annotate flex-style completions according to match tightness) Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC45yQsV5r7ECEGaHpDu6ESqB=qsagkU=VBNKZq1r-9k2fq5eA@mail.gmail.com' \
    --to=slippycheeze@google.com \
    --cc=emacs-devel@gnu.org \
    --cc=joaotavora@gmail.com \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).