* Emacs needs truely useful flex matching @ 2013-03-21 15:02 Le Wang 2013-03-21 17:49 ` Óscar Fuentes 2013-03-22 2:36 ` Richard Stallman 0 siblings, 2 replies; 12+ messages in thread From: Le Wang @ 2013-03-21 15:02 UTC (permalink / raw) To: emacs-devel Hi all, Since there is a big thread about a standard way to recognise project roots, I want to bring up another area in which Emacs is falling behind other Editors (Sublime Text, Textmate, and Vim). Choosing from a very large list of files (or any strings for that matter) with a minimum of keystrokes. ido-mode has `ido-enable-flex-matching', but that does not do the smart sorting required. Vim has this through the Command-T plugin. The best way to "get it" is by watching it in action: https://s3.amazonaws.com/s3.wincent.com/command-t/screencasts/command-t-demo.mov Skip to 6 minutes to see it in action in a large project with repetitive path segments. Homepage here: https://wincent.com/products/command-t The matching engine is implemented in C and it interfaces with Vim through the Ruby bridge. I think in order to sort a large list of strings (> 10k), this will also have to be implemented in C to be fast enough if done for Emacs. The sorting algorithm is roughly this for a query: "abcd" 1. Get all matches for "a.*b.*c.*c" 2. Calculate score of each match - contiguous matched chars gets a boost - matches at word and camelCase boundaries (abbreviation) get a boost - matches with smallest starting index gets a boost 2. Sort list according to score. A lot of my colleagues use this kind of flex matching to navigate around our large code base in Sublime Text and I'm very jealous that with so few keystrokes the most useful matches just float to the top. This navigation could be implemented with Helm if Emacs had a builtin fast smart flex sorting engine. -- Le ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-21 15:02 Emacs needs truely useful flex matching Le Wang @ 2013-03-21 17:49 ` Óscar Fuentes 2013-03-21 23:34 ` Le Wang 2013-03-21 23:58 ` Stefan Monnier 2013-03-22 2:36 ` Richard Stallman 1 sibling, 2 replies; 12+ messages in thread From: Óscar Fuentes @ 2013-03-21 17:49 UTC (permalink / raw) To: emacs-devel Le Wang <l26wang@gmail.com> writes: > Since there is a big thread about a standard way to recognise project roots, I > want to bring up another area in which Emacs is falling behind other > Editors (Sublime Text, Textmate, and Vim). > Choosing from a very large list of files (or any strings for that matter) with > a minimum of keystrokes. > > ido-mode has `ido-enable-flex-matching', but that does not do the smart > sorting required. > > Vim has this through the Command-T plugin. The best way to "get it" is by > watching it in action: > https://s3.amazonaws.com/s3.wincent.com/command-t/screencasts/command-t-demo.mov > > Skip to 6 minutes to see it in action in a large project with > repetitive path segments. > > Homepage here: https://wincent.com/products/command-t > > The matching engine is implemented in C and it interfaces with Vim through the > Ruby bridge. I think in order to sort a large list of strings (> 10k), this > will also have to be implemented in C to be fast enough if done for Emacs. > > The sorting algorithm is roughly this for a query: "abcd" > > 1. Get all matches for "a.*b.*c.*c" > 2. Calculate score of each match > - contiguous matched chars gets a boost > - matches at word and camelCase boundaries (abbreviation) get a boost > - matches with smallest starting index gets a boost > 2. Sort list according to score. > > A lot of my colleagues use this kind of flex matching to navigate around our > large code base in Sublime Text and I'm very jealous that with so few > keystrokes the most useful > matches just float to the top. > > This navigation could be implemented with Helm if Emacs had a builtin > fast smart flex sorting engine. IIUC the vim plugin you mention depends on a pre-built list of files. In that regard, how is it better than GNU Global, which allows to search files with a regex (and much more)? Maybe the algorithm you describe can be implemented on GNU Global and then use an Emacs interface. I agree that it would be nice to have such a feature native on Emacs, though. IIRC someone mentioned a few days ago that a regex of the type you descibe has very high execution complexity (-> is slow) but obviously it is working for that vim extension. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-21 17:49 ` Óscar Fuentes @ 2013-03-21 23:34 ` Le Wang 2013-03-21 23:58 ` Stefan Monnier 1 sibling, 0 replies; 12+ messages in thread From: Le Wang @ 2013-03-21 23:34 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel On Fri, Mar 22, 2013 at 1:49 AM, Óscar Fuentes <ofv@wanadoo.es> wrote: > Le Wang <l26wang@gmail.com> writes: > >> Since there is a big thread about a standard way to recognise project roots, I >> want to bring up another area in which Emacs is falling behind other >> Editors (Sublime Text, Textmate, and Vim). >> Choosing from a very large list of files (or any strings for that matter) with >> a minimum of keystrokes. >> >> ido-mode has `ido-enable-flex-matching', but that does not do the smart >> sorting required. >> >> Vim has this through the Command-T plugin. The best way to "get it" is by >> watching it in action: >> https://s3.amazonaws.com/s3.wincent.com/command-t/screencasts/command-t-demo.mov >> >> Skip to 6 minutes to see it in action in a large project with >> repetitive path segments. >> >> Homepage here: https://wincent.com/products/command-t >> >> The matching engine is implemented in C and it interfaces with Vim through the >> Ruby bridge. I think in order to sort a large list of strings (> 10k), this >> will also have to be implemented in C to be fast enough if done for Emacs. >> >> The sorting algorithm is roughly this for a query: "abcd" >> >> 1. Get all matches for "a.*b.*c.*c" >> 2. Calculate score of each match >> - contiguous matched chars gets a boost >> - matches at word and camelCase boundaries (abbreviation) get a boost >> - matches with smallest starting index gets a boost >> 2. Sort list according to score. >> >> A lot of my colleagues use this kind of flex matching to navigate around our >> large code base in Sublime Text and I'm very jealous that with so few >> keystrokes the most useful >> matches just float to the top. >> >> This navigation could be implemented with Helm if Emacs had a builtin >> fast smart flex sorting engine. > > IIUC the vim plugin you mention depends on a pre-built list of files. In > that regard, how is it better than GNU Global, which allows to search > files with a regex (and much more)? Maybe the algorithm you describe > can be implemented on GNU Global and then use an Emacs interface. The type of smart matching + sorting would be useful for files, M-x, C-h v, all kinds of completion where large lists are involved. GNU globals maybe should support it, but Emacs shouldn't depend on external dependency to offer something this useful. > I agree that it would be nice to have such a feature native on Emacs, > though. IIRC someone mentioned a few days ago that a regex of the type > you descibe has very high execution complexity (-> is slow) That is true, and speed is of the utmost essence, so emacs-lisp is probably too slow. > but obviously it is working for that vim extension. Textmate has had this for a while, and I see Sublime Text do it very well every day, of course. :( These editors are moving the ball for what people expect a modern editor to provide. Honestly when I first tried `ido-enable-flex-matchine', I thought it was strange, So it's entirely possible I haven't made the strongest case for the usefulness of this feature. Please let me know if you aren't convinced.. -- Le ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-21 17:49 ` Óscar Fuentes 2013-03-21 23:34 ` Le Wang @ 2013-03-21 23:58 ` Stefan Monnier 2013-03-22 1:00 ` Le Wang 1 sibling, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2013-03-21 23:58 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel >> The sorting algorithm is roughly this for a query: "abcd" >> >> 1. Get all matches for "a.*b.*c.*c" >> 2. Calculate score of each match >> - contiguous matched chars gets a boost >> - matches at word and camelCase boundaries (abbreviation) get a boost >> - matches with smallest starting index gets a boost >> 2. Sort list according to score. I think that if you turn "abcd" into a regexp of the form "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d" the regexp matching should be fairly efficient and you should be able to compute the score efficiently as well (at least if you ignore the camelCase boundaries). >> This navigation could be implemented with Helm if Emacs had a builtin >> fast smart flex sorting engine. Have you tried such an approach and it really was too slow? I'd welcome a new completion-style using the above flex matching. > IIUC the vim plugin you mention depends on a pre-built list of files. Indeed when searching for a file in a file hierarchy, you'd need a pre-built list of files, otherwise the time taken to find the files would dwarf the flex-matching time in any case. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-21 23:58 ` Stefan Monnier @ 2013-03-22 1:00 ` Le Wang 2013-03-22 8:24 ` Eli Zaretskii 2013-04-14 16:48 ` Le Wang 0 siblings, 2 replies; 12+ messages in thread From: Le Wang @ 2013-03-22 1:00 UTC (permalink / raw) To: Stefan Monnier; +Cc: Óscar Fuentes, emacs-devel On Fri, Mar 22, 2013 at 7:58 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >>> The sorting algorithm is roughly this for a query: "abcd" >>> >>> 1. Get all matches for "a.*b.*c.*c" >>> 2. Calculate score of each match >>> - contiguous matched chars gets a boost >>> - matches at word and camelCase boundaries (abbreviation) get a boost >>> - matches with smallest starting index gets a boost >>> 2. Sort list according to score. > > I think that if you turn "abcd" into a regexp of the form > "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d" > the regexp matching should be fairly efficient and you should be able to > compute the score efficiently as well (at least if > you ignore the camelCase boundaries). I hadn't thought of this, and I'll try it soon. > Have you tried such an approach and it really was too slow? > I'd welcome a new completion-style using the above flex matching. You're right that in generally that my claim of "emacs-lisp would be too slow" needs to be verified. >> IIUC the vim plugin you mention depends on a pre-built list of files. > > Indeed when searching for a file in a file hierarchy, you'd need > a pre-built list of files, otherwise the time taken to find the files > would dwarf the flex-matching time in any case. "git ls-files" is very fast on a solid state drive, and the list can also be cached. But let's assume we already have the list of strings. -- Le ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-22 1:00 ` Le Wang @ 2013-03-22 8:24 ` Eli Zaretskii 2013-03-22 11:18 ` Dmitry Gutov 2013-04-14 16:48 ` Le Wang 1 sibling, 1 reply; 12+ messages in thread From: Eli Zaretskii @ 2013-03-22 8:24 UTC (permalink / raw) To: Le Wang; +Cc: ofv, monnier, emacs-devel > Date: Fri, 22 Mar 2013 09:00:14 +0800 > From: Le Wang <l26wang@gmail.com> > Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org > > > Indeed when searching for a file in a file hierarchy, you'd need > > a pre-built list of files, otherwise the time taken to find the files > > would dwarf the flex-matching time in any case. > > "git ls-files" is very fast on a solid state drive, and the list can > also be cached. Compare "git ls-files" with "ls -R" (time-wise and also look at the disk activity LED, if you have one) on the same directory and machine, and you will see that git does not actually look at the filesystem to find the files. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-22 8:24 ` Eli Zaretskii @ 2013-03-22 11:18 ` Dmitry Gutov 0 siblings, 0 replies; 12+ messages in thread From: Dmitry Gutov @ 2013-03-22 11:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel, monnier, Le Wang Eli Zaretskii <eliz@gnu.org> writes: >> Date: Fri, 22 Mar 2013 09:00:14 +0800 >> From: Le Wang <l26wang@gmail.com> >> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org >> >> > Indeed when searching for a file in a file hierarchy, you'd need >> > a pre-built list of files, otherwise the time taken to find the files >> > would dwarf the flex-matching time in any case. >> >> "git ls-files" is very fast on a solid state drive, and the list can >> also be cached. > > Compare "git ls-files" with "ls -R" (time-wise and also look at the > disk activity LED, if you have one) on the same directory and machine, > and you will see that git does not actually look at the filesystem to > find the files. Sure, and that means that caching the list of files a in directory tree is a somewhat solved issue. Whereas the kind of magical flex matching Le is describing, isn't. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-22 1:00 ` Le Wang 2013-03-22 8:24 ` Eli Zaretskii @ 2013-04-14 16:48 ` Le Wang 2013-04-14 18:18 ` Stefan Monnier 1 sibling, 1 reply; 12+ messages in thread From: Le Wang @ 2013-04-14 16:48 UTC (permalink / raw) To: Stefan Monnier; +Cc: Óscar Fuentes, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2369 bytes --] On Fri, Mar 22, 2013 at 9:00 AM, Le Wang <l26wang@gmail.com> wrote: > On Fri, Mar 22, 2013 at 7:58 AM, Stefan Monnier > <monnier@iro.umontreal.ca> wrote: > >>> The sorting algorithm is roughly this for a query: "abcd" > >>> > >>> 1. Get all matches for "a.*b.*c.*c" > >>> 2. Calculate score of each match > >>> - contiguous matched chars gets a boost > >>> - matches at word and camelCase boundaries (abbreviation) get a boost > >>> - matches with smallest starting index gets a boost > >>> 2. Sort list according to score. > > > > I think that if you turn "abcd" into a regexp of the form > > > "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d" > > the regexp matching should be fairly efficient and you should be able to > > compute the score efficiently as well (at least if > > you ignore the camelCase boundaries). > > I hadn't thought of this, and I'll try it soon. I gave this a good try. :-) Since we are favouring beginning of word anchors (often but not always), I actually had to insert "\\<" in the various slots in front of characters. That is all permutations of 4x"\\<", then 3x, then 2x, then 1x, etc. I bumped into the regexp length limit very quickly and it wasn't fast enough even when it did work. However, it turns out that emacs-lisp is fast enough with a tweaked algorithm -- no regexps at all. To wit, 1. For each string, we allocate a hash keyed by a character and value is a sorted list indices into the string for occurence of the key. (char-indices-hash) 2. For each string, we allocate a vector of same length do static analysis to assign a heat value to each position. (call this the heatmap-vector) 3. For a query "abcd" we can quickly find all combinations of ascending indices using char-indices-hash. 4. For each of these combinations, we can compute a heat value based on the heatmap-vector. We take the max of all heat values as the "score" of the query against the string. 5. Order matches by score. The algorithm works fast. I believe it has feature parity with Sublime Text's fuzzy search. However it uses a fair bit of memory, 1. it's allocating one hash table and one same-length vector for each string 2. it's allocating one cons cell for each character. Does anyone have any optimisation ideas to use less memory? I will have code for review in the coming days. -- Le [-- Attachment #2: Type: text/html, Size: 3481 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-04-14 16:48 ` Le Wang @ 2013-04-14 18:18 ` Stefan Monnier 2013-04-15 0:14 ` Le Wang 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2013-04-14 18:18 UTC (permalink / raw) To: Le Wang; +Cc: Óscar Fuentes, emacs-devel >> "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d" >> > the regexp matching should be fairly efficient and you should be able to >> > compute the score efficiently as well (at least if >> > you ignore the camelCase boundaries). >> I hadn't thought of this, and I'll try it soon. > I gave this a good try. :-) > Since we are favouring beginning of word anchors (often but not always), I > actually had to insert "\\<" in the various slots in front of characters. > That is all permutations of 4x"\\<", then 3x, then 2x, then 1x, etc. I don't understand. The example I gave already has all the "\\<" we need, doesn't it? > The algorithm works fast. I believe it has feature parity with Sublime > Text's fuzzy search. > However it uses a fair bit of memory, > 1. it's allocating one hash table and one same-length vector for each string > 2. it's allocating one cons cell for each character. Sounds great. It does seem to require a fair bit of pre-processing, tho. When you say it's fast, does that include the time it takes to build the hash tables from the list of potential candidates? Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-04-14 18:18 ` Stefan Monnier @ 2013-04-15 0:14 ` Le Wang 2013-04-15 13:50 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Le Wang @ 2013-04-15 0:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: Óscar Fuentes, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2425 bytes --] On Mon, Apr 15, 2013 at 2:18 AM, Stefan Monnier <monnier@iro.umontreal.ca>wrote: > >> > "\\(\\<\\)?a\\([^b]*\\)\\(\\<\\)?b\\([^c]*\\)\\(\\<\\)?c\\([^d]*\\)\\(\\<\\)?d" > >> > the regexp matching should be fairly efficient and you should be able > to > >> > compute the score efficiently as well (at least if > >> > you ignore the camelCase boundaries). > >> I hadn't thought of this, and I'll try it soon. > > I gave this a good try. :-) > > Since we are favouring beginning of word anchors (often but not always), > I > > actually had to insert "\\<" in the various slots in front of characters. > > That is all permutations of 4x"\\<", then 3x, then 2x, then 1x, etc. > > I don't understand. The example I gave already has all the "\\<" we > need, doesn't it? The regexp you gave works gave works the same as "a.*?b.*?c.*?d". It doesn't really prefer word boundaries. Consider "fafbfcfd-bcd", your regexp will match "afbfcfd" from the first word, but we want to prefer "a" from the first word, "bcd" from the second word. I could only match the preferred parts by inserting the max number of "\\<" and then one less (but insert them in different places preferring places closer to the beginning of the string. > The algorithm works fast. I believe it has feature parity with Sublime > > Text's fuzzy search. > > However it uses a fair bit of memory, > > 1. it's allocating one hash table and one same-length vector for each > string > > 2. it's allocating one cons cell for each character. > > Sounds great. It does seem to require a fair bit of pre-processing, tho. > When you say it's fast, does that include the time it takes to build the > hash tables from the list of potential candidates? > I missed describing the top level caching layer. I keep two global caches -- one for filenames, one for other strings. The key is a completion string and the value is the analysis for that string (hashtable + heatmap). The algorithm works differently for files than other strings, scoring different path segments differently (basepath is especially preferred). When I say fast I mean the matching and scoring is fast, once the static analysis is done. Warming up the cache with 6300 filenames (average length 44 characters) it takes 0.7 seconds and invokes the garbage collector 8 times. Actually can I skip the GC somehow since I know I'm about to allocate a bunch of memory that never gets freed. -- Le [-- Attachment #2: Type: text/html, Size: 3631 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-04-15 0:14 ` Le Wang @ 2013-04-15 13:50 ` Stefan Monnier 0 siblings, 0 replies; 12+ messages in thread From: Stefan Monnier @ 2013-04-15 13:50 UTC (permalink / raw) To: Le Wang; +Cc: Óscar Fuentes, emacs-devel > Consider "fafbfcfd-bcd", your regexp will match "afbfcfd" from the first > word, but we want to prefer "a" from the first word, "bcd" from the second > word. Duh! Thanks for spelling it out. So indeed you'd need regexps like (re "ab...") = (concat "\\<a\\([^b]*\\)" (re "b...") "\\|a\\([^b]*\\)" (re "b...")) which blows up very quickly both in size and in matching time. You can probably make it work with regexps, but it'll take more work: - Match using the kind of regexp I suggested. Use it to filter out uninteresting candidates. - For those elements that match, try to match with stronger regexps to see if the match's value can be improved. I.e. for "abcd", you'd first try "\\<a\\([^b]*\\)b\\([^c]*\\)c\\([^d]*\\)d", then either try "a\\([^b]*\\)\\<b\\([^c]*\\)c\\([^d]*\\)d" or try "\\<a\\([^b]*\\)\\<b\\([^c]*\\)c\\([^d]*\\)d" depending on the first test, etc... For an N-char key, that's O(N) regexp-matches, tho it may also turn out to require O(N) regexp-compilations, so it's probably OK but might require some extra work to optimize it further. > Actually can I skip the GC somehow since I know I'm about to allocate a > bunch of memory that never gets freed. You can let-bound gc-cons-threshold, but I think it's not worth the risk (the let-binding may apply to more code than expected in situations such as debugging). Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Emacs needs truely useful flex matching 2013-03-21 15:02 Emacs needs truely useful flex matching Le Wang 2013-03-21 17:49 ` Óscar Fuentes @ 2013-03-22 2:36 ` Richard Stallman 1 sibling, 0 replies; 12+ messages in thread From: Richard Stallman @ 2013-03-22 2:36 UTC (permalink / raw) To: Le Wang; +Cc: emacs-devel Vim has this through the Command-T plugin. If it is feasible to implement this in Lisp, how about giving it a try? -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-04-15 13:50 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-21 15:02 Emacs needs truely useful flex matching Le Wang 2013-03-21 17:49 ` Óscar Fuentes 2013-03-21 23:34 ` Le Wang 2013-03-21 23:58 ` Stefan Monnier 2013-03-22 1:00 ` Le Wang 2013-03-22 8:24 ` Eli Zaretskii 2013-03-22 11:18 ` Dmitry Gutov 2013-04-14 16:48 ` Le Wang 2013-04-14 18:18 ` Stefan Monnier 2013-04-15 0:14 ` Le Wang 2013-04-15 13:50 ` Stefan Monnier 2013-03-22 2:36 ` Richard Stallman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.