This is a follow up on a previous discussion regarding Single quotes in Info. I've been looking into ways of having the search functions fold similar characters together. There are a few goals which I'm listing above to facilitate comparision of possible approaches. Feel free to mention other highly-important goals, but please don't go into high-level abstractions (such as letting the user define groups), these can always be done and are not relevant to this discussion. 1. Follow the `decomposition' char property. For instance, the character "a" in the search string would match any one of "aãáâ" (and so on). This is easy to do, and one of the patches below already shows how. Note that this won't handle symbols that are actually composed of multiple characters. 2. Follow an intuitive sense of similarity which is not defined in the unicode standard. For instance, an ascii single quote in the search string should match any type of single quote (there are about a dozen that I know of). 3. Ignore modifier (non-spacing) characters. Another way of writing "á" is to write "a" followed by a special non-spacing accute. This kind of thing (a symbol composed of multiple characters) is not handled by item 1, so I'm listing as a separate point. 4. Perform the conversion two-ways. That is, item 1 should work even if the search contained "á" instead of "a". Item 2 should match an ascii quote if the search string contains a curly quote. This is mostly useful when the user copies a fancy string from somewhere and pastes it into the search field. 5. It should work for any searching, not just isearch. Goals 1, 2, and 3 are the most important (in my opinion). Goals 1 and 2 are achieved by all of the patches below, while the others vary. ----------------------------------------------------------- Below, I'm attaching 3 patches, they each represent a different way of achieving part of the above. * group-folding-with-regexp-lisp.patch This one takes each input character and either keeps it verbatim or transform it into a regexp which matches the entire group that this character represents. It is implemented in isearch. + It trivially handles goals 1, 2 and 3. Because regexps are quite versatile, it is the only solution that handles item 3 (it allows each character to match more than a single character). + Goal 4 can be achieved with a bit more work (the input just needs to be normalized before turning it into a regexp). - It is slower than the options below, but it should be fast enough for isearch. - Goal 5 would take a lot more work. This character parsing would have to be added to each of search functions (not to mention it might be too slow for lisp-code searches). (Note that the attached patch doesn't actually do item 1. That is NOT a limitation, it can do item 1 quite trivially. I simply haven't done it yet.) * group-folding-with-case-table-lisp.patch This patch is entirely in elisp. I've put it all inside `isearch.el' for now, for the sake of simplicity, but it's not restricted to isearch. It creates a new case-table which performs group folding by borrowing the case-folding machinery, so it is very fast. Then, group folding can be achieved by running the search inside a `with-group-folding` macro. There's also an example implementation which turns it on for isearch by default. + It immediately satisfies items 1, 2, 4, and 5. + It is very fast. - It has no simple way of achieving item 3. (Note that the attached patch doesn't actually do item 2. That is NOT a limitation, it can do item 2 quite trivially. I simply haven't done it yet.) * group-folding-with-case-table-C.patch This patch defines a new char-table and uses it instead of case_canon_table when the group-fold-search variable is non-nil. This shares the advantages and disadvantages of the lisp patch above but, in addition: + You don't need a `with-group-folding' macro, all you need is to (let ((group-fold-search t)) ...) around the search which is more in terms with how case-folding works. - If the user decides to set `group-fold-search' to t, this can break existing code (a disadvantage that the lisp version above does not have). - It adds two extra fields to every buffer object (the boolean variable and the char table). (Note that compiling this last patch gives a crashing executable for me. I'm just putting it here to showcase the option.) --------------------- My question is: Do any of these options seem good enough? Which would you all like to explore? I like the second one best, but goal 3 is quite important.