unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Le Wang <l26wang@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: "Óscar Fuentes" <ofv@wanadoo.es>, emacs-devel@gnu.org
Subject: Re: Emacs needs truely useful flex matching
Date: Mon, 15 Apr 2013 08:14:44 +0800	[thread overview]
Message-ID: <CAM=K+irSz6aDT8TN+JDNz-goZr4cE6MeUExePJ_gUNGZrk9Z_Q@mail.gmail.com> (raw)
In-Reply-To: <jwvvc7pf08y.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2425 bytes --]

On Mon, Apr 15, 2013 at 2:18 AM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

> >>
> "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d"
> >> > the regexp matching should be fairly efficient and you should be able
> to
> >> > compute the score efficiently as well (at least if
> >> > you ignore the camelCase boundaries).
> >> I hadn't thought of this, and I'll try it soon.
> > I gave this a good try.  :-)
> > Since we are favouring beginning of word anchors (often but not always),
> I
> > actually had to insert "\\<" in the various slots in front of characters.
> > That is all permutations of 4x"\\<", then 3x, then 2x, then 1x, etc.
>
> I don't understand.  The example I gave already has all the "\\<" we
> need, doesn't it?


The regexp you gave works gave works the same as "a.*?b.*?c.*?d".  It
doesn't really prefer word boundaries.

Consider "fafbfcfd-bcd", your regexp will match "afbfcfd" from the first
word, but we want to prefer "a" from the first word, "bcd" from the second
word.

I could only match the preferred parts by inserting the max number of "\\<"
and then one less (but insert them in different places preferring places
closer to the beginning of the string.


> The algorithm works fast.  I believe it has feature parity with Sublime
> > Text's fuzzy search.
> > However it uses a fair bit of memory,
> > 1. it's allocating one hash table and one same-length vector for each
> string
> > 2. it's allocating one cons cell for each character.
>
> Sounds great.  It does seem to require a fair bit of pre-processing, tho.
> When you say it's fast, does that include the time it takes to build the
> hash tables from the list of potential candidates?
>

I missed describing the top level caching layer.  I keep two global caches
-- one for filenames, one for other strings.  The key is a completion
string and the value is the analysis for that string (hashtable + heatmap).

The algorithm works differently for files than other strings, scoring
different path segments differently (basepath is especially preferred).

When I say fast I mean the matching and scoring is fast, once the static
analysis is done.

Warming up the cache with 6300 filenames (average length 44 characters) it
takes 0.7 seconds and invokes the garbage collector 8 times.

Actually can I skip the GC somehow since I know I'm about to allocate a
bunch of memory that never gets freed.

-- 
Le

[-- Attachment #2: Type: text/html, Size: 3631 bytes --]

  reply	other threads:[~2013-04-15  0:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:02 Emacs needs truely useful flex matching Le Wang
2013-03-21 17:49 ` Óscar Fuentes
2013-03-21 23:34   ` Le Wang
2013-03-21 23:58   ` Stefan Monnier
2013-03-22  1:00     ` Le Wang
2013-03-22  8:24       ` Eli Zaretskii
2013-03-22 11:18         ` Dmitry Gutov
2013-04-14 16:48       ` Le Wang
2013-04-14 18:18         ` Stefan Monnier
2013-04-15  0:14           ` Le Wang [this message]
2013-04-15 13:50             ` Stefan Monnier
2013-03-22  2:36 ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=K+irSz6aDT8TN+JDNz-goZr4cE6MeUExePJ_gUNGZrk9Z_Q@mail.gmail.com' \
    --to=l26wang@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=ofv@wanadoo.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).