From: Le Wang <l26wang@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: "Óscar Fuentes" <ofv@wanadoo.es>, emacs-devel@gnu.org
Subject: Re: Emacs needs truely useful flex matching
Date: Mon, 15 Apr 2013 00:48:21 +0800 [thread overview]
Message-ID: <CAM=K+ipDy92SpLfR_mMEW+Rbhzz3KpJv4uypxKwHhhQOfJp4kg@mail.gmail.com> (raw)
In-Reply-To: <CAM=K+iprKKQx0sqvxwDxpd65rKcs1=WfsLSDfuBkppvZ7qnofA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2369 bytes --]
On Fri, Mar 22, 2013 at 9:00 AM, Le Wang <l26wang@gmail.com> wrote:
> On Fri, Mar 22, 2013 at 7:58 AM, Stefan Monnier
> <monnier@iro.umontreal.ca> wrote:
> >>> The sorting algorithm is roughly this for a query: "abcd"
> >>>
> >>> 1. Get all matches for "a.*b.*c.*c"
> >>> 2. Calculate score of each match
> >>> - contiguous matched chars gets a boost
> >>> - matches at word and camelCase boundaries (abbreviation) get a boost
> >>> - matches with smallest starting index gets a boost
> >>> 2. Sort list according to score.
> >
> > I think that if you turn "abcd" into a regexp of the form
> >
> "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d"
> > the regexp matching should be fairly efficient and you should be able to
> > compute the score efficiently as well (at least if
> > you ignore the camelCase boundaries).
>
> I hadn't thought of this, and I'll try it soon.
I gave this a good try. :-)
Since we are favouring beginning of word anchors (often but not always), I
actually had to insert "\\<" in the various slots in front of characters.
That is all permutations of 4x"\\<", then 3x, then 2x, then 1x, etc. I
bumped into the regexp length limit very quickly and it wasn't fast enough
even when it did work.
However, it turns out that emacs-lisp is fast enough with a tweaked
algorithm -- no regexps at all. To wit,
1. For each string, we allocate a hash keyed by a character and value is a
sorted list indices into the string for occurence of the key.
(char-indices-hash)
2. For each string, we allocate a vector of same length do static analysis
to assign a heat value to each position. (call this the heatmap-vector)
3. For a query "abcd" we can quickly find all combinations of ascending
indices using char-indices-hash.
4. For each of these combinations, we can compute a heat value based on the
heatmap-vector. We take the max of all heat values as the "score" of the
query against the string.
5. Order matches by score.
The algorithm works fast. I believe it has feature parity with Sublime
Text's fuzzy search.
However it uses a fair bit of memory,
1. it's allocating one hash table and one same-length vector for each string
2. it's allocating one cons cell for each character.
Does anyone have any optimisation ideas to use less memory? I will have
code for review in the coming days.
--
Le
[-- Attachment #2: Type: text/html, Size: 3481 bytes --]
next prev parent reply other threads:[~2013-04-14 16:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-21 15:02 Emacs needs truely useful flex matching Le Wang
2013-03-21 17:49 ` Óscar Fuentes
2013-03-21 23:34 ` Le Wang
2013-03-21 23:58 ` Stefan Monnier
2013-03-22 1:00 ` Le Wang
2013-03-22 8:24 ` Eli Zaretskii
2013-03-22 11:18 ` Dmitry Gutov
2013-04-14 16:48 ` Le Wang [this message]
2013-04-14 18:18 ` Stefan Monnier
2013-04-15 0:14 ` Le Wang
2013-04-15 13:50 ` Stefan Monnier
2013-03-22 2:36 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM=K+ipDy92SpLfR_mMEW+Rbhzz3KpJv4uypxKwHhhQOfJp4kg@mail.gmail.com' \
--to=l26wang@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
--cc=ofv@wanadoo.es \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).