unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Character group folding in searches
@ 2015-02-06 13:04 Artur Malabarba
  2015-02-06 14:32 ` Eli Zaretskii
  2015-02-07  0:07 ` Juri Linkov
  0 siblings, 2 replies; 25+ messages in thread
From: Artur Malabarba @ 2015-02-06 13:04 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4679 bytes --]

This is a follow up on a previous discussion regarding Single quotes in Info.

I've been looking into ways of having the search functions fold
similar characters together. There are a few goals which I'm listing
above to facilitate comparision of possible approaches. Feel free to
mention other highly-important goals, but please don't go into
high-level abstractions (such as letting the user define groups),
these can always be done and are not relevant to this discussion.

1. Follow the `decomposition' char property. For instance, the
character "a" in the search string would match any one of  "aãáâ" (and
so on). This is easy to do, and one of the patches below already shows
how. Note that this won't handle symbols that are actually composed of
multiple characters.

2. Follow an intuitive sense of similarity which is not defined in the
unicode standard. For instance, an ascii single quote in the search
string should match any type of single quote (there are about a dozen
that I know of).

3. Ignore modifier (non-spacing) characters. Another way of writing
"á" is to write "a" followed by a special non-spacing accute. This
kind of thing (a symbol composed of multiple characters) is not
handled by item 1, so I'm listing as a separate point.

4. Perform the conversion two-ways. That is, item 1 should work even
if the search contained "á" instead of "a". Item 2 should match an
ascii quote if the search string contains a curly quote. This is
mostly useful when the user copies a fancy string from somewhere and
pastes it into the search field.

5. It should work for any searching, not just isearch.


Goals 1, 2, and 3 are the most important (in my opinion).
Goals 1 and 2 are achieved by all of the patches below, while the others vary.

-----------------------------------------------------------

Below, I'm attaching 3 patches, they each represent a different way of
achieving part of the above.

* group-folding-with-regexp-lisp.patch

This one takes each input character and either keeps it verbatim or
transform it into a regexp which matches the entire group that this
character represents. It is implemented in isearch.

+ It trivially handles goals 1, 2 and 3. Because regexps are quite
versatile, it is the only solution that handles item 3 (it allows each
character to match more than a single character).
+ Goal 4 can be achieved with a bit more work (the input just needs to
be normalized before turning it into a regexp).
- It is slower than the options below, but it should be fast enough for isearch.
- Goal 5 would take a lot more work. This character parsing would have
to be added to each of search functions (not to mention it might be
too slow for lisp-code searches).

(Note that the attached patch doesn't actually do item 1. That is NOT
a limitation, it can do item 1 quite trivially. I simply haven't done
it yet.)

* group-folding-with-case-table-lisp.patch

This patch is entirely in elisp. I've put it all inside `isearch.el'
for now, for the sake of simplicity, but it's not restricted to
isearch.

It creates a new case-table which performs group folding by borrowing
the case-folding machinery, so it is very fast. Then, group folding
can be achieved by running the search inside a `with-group-folding`
macro. There's also an example implementation which turns it on for
isearch by default.

+ It immediately satisfies items 1, 2, 4, and 5.
+ It is very fast.
- It has no simple way of achieving item 3.

(Note that the attached patch doesn't actually do item 2. That is NOT
a limitation, it can do item 2 quite trivially. I simply haven't done
it yet.)

* group-folding-with-case-table-C.patch

This patch defines a new char-table and uses it instead of
case_canon_table when the group-fold-search variable is non-nil.

This shares the advantages and disadvantages of the lisp patch above
but, in addition:
+ You don't need a `with-group-folding' macro, all you need is to (let
((group-fold-search t)) ...) around the search which is more in terms
with how case-folding works.
- If the user decides to set `group-fold-search' to t, this can break
existing code (a disadvantage that the lisp version above does not
have).
- It adds two extra fields to every buffer object (the boolean
variable and the char table).

(Note that compiling this last patch gives a crashing executable for
me. I'm just putting it here to showcase the option.)

---------------------

My question is:

Do any of these options seem good enough? Which would you all like to explore?
I like the second one best, but goal 3 is quite important.

[-- Attachment #2: group-folding-with-case-table-C.patch --]
[-- Type: text/x-patch, Size: 8983 bytes --]

From d49d03abdbb1bdb892a322ed6d9e25648edc3b56 Mon Sep 17 00:00:00 2001
From: Artur Malabarba <bruce.connor.am@gmail.com>
Date: Thu, 5 Feb 2015 22:50:52 -0200
Subject: [PATCH] hi

---
 src/buffer.c  | 18 ++++++++++++++++++
 src/buffer.h  |  9 +++++++++
 src/casetab.c |  7 +++++++
 src/editfns.c |  6 ++++--
 src/search.c  | 27 +++++++++++++++++----------
 5 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/src/buffer.c b/src/buffer.c
index 67eda3e..7160850 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -182,6 +182,11 @@ bset_case_fold_search (struct buffer *b, Lisp_Object val)
   b->INTERNAL_FIELD (case_fold_search) = val;
 }
 static void
+bset_group_fold_search (struct buffer *b, Lisp_Object val)
+{
+  b->INTERNAL_FIELD (group_fold_search) = val;
+}
+static void
 bset_ctl_arrow (struct buffer *b, Lisp_Object val)
 {
   b->INTERNAL_FIELD (ctl_arrow) = val;
@@ -975,6 +980,7 @@ reset_buffer_local_variables (struct buffer *b, bool permanent_too)
   bset_upcase_table (b, XCHAR_TABLE (Vascii_downcase_table)->extras[0]);
   bset_case_canon_table (b, XCHAR_TABLE (Vascii_downcase_table)->extras[1]);
   bset_case_eqv_table (b, XCHAR_TABLE (Vascii_downcase_table)->extras[2]);
+  bset_group_canon_table (b, XCHAR_TABLE (Vascii_downcase_table)->extras[1]);
   bset_invisibility_spec (b, Qt);
 
   /* Reset all (or most) per-buffer variables to their defaults.  */
@@ -5053,6 +5059,7 @@ init_buffer_once (void)
   bset_upcase_table (&buffer_local_flags, make_number (0));
   bset_case_canon_table (&buffer_local_flags, make_number (0));
   bset_case_eqv_table (&buffer_local_flags, make_number (0));
+  bset_group_canon_table (&buffer_local_flags, make_number (0));
   bset_minor_modes (&buffer_local_flags, make_number (0));
   bset_width_table (&buffer_local_flags, make_number (0));
   bset_pt_marker (&buffer_local_flags, make_number (0));
@@ -5065,6 +5072,7 @@ init_buffer_once (void)
   XSETFASTINT (BVAR (&buffer_local_flags, abbrev_mode), idx); ++idx;
   XSETFASTINT (BVAR (&buffer_local_flags, overwrite_mode), idx); ++idx;
   XSETFASTINT (BVAR (&buffer_local_flags, case_fold_search), idx); ++idx;
+  XSETFASTINT (BVAR (&buffer_local_flags, group_fold_search), idx); ++idx;
   XSETFASTINT (BVAR (&buffer_local_flags, auto_fill_function), idx); ++idx;
   XSETFASTINT (BVAR (&buffer_local_flags, selective_display), idx); ++idx;
   XSETFASTINT (BVAR (&buffer_local_flags, selective_display_ellipses), idx); ++idx;
@@ -5143,6 +5151,7 @@ init_buffer_once (void)
   bset_abbrev_mode (&buffer_defaults, Qnil);
   bset_overwrite_mode (&buffer_defaults, Qnil);
   bset_case_fold_search (&buffer_defaults, Qt);
+  bset_group_fold_search (&buffer_defaults, Qnil);
   bset_auto_fill_function (&buffer_defaults, Qnil);
   bset_selective_display (&buffer_defaults, Qnil);
   bset_selective_display_ellipses (&buffer_defaults, Qt);
@@ -5486,6 +5495,11 @@ This is the same as (default-value 'tab-width).  */);
 			  doc: /* Default value of `case-fold-search' for buffers that don't override it.
 This is the same as (default-value 'case-fold-search).  */);
 
+  DEFVAR_BUFFER_DEFAULTS ("default-group-fold-search",
+			  group_fold_search,
+			  doc: /* Default value of `group-fold-search' for buffers that don't override it.
+This is the same as (default-value 'group-fold-search).  */);
+
   DEFVAR_BUFFER_DEFAULTS ("default-left-margin-width",
 			  left_margin_cols,
 			  doc: /* Default value of `left-margin-width' for buffers that don't override it.
@@ -5657,6 +5671,10 @@ Use the command `abbrev-mode' to change this variable.  */);
 		     Qnil,
 		     doc: /* Non-nil if searches and matches should ignore case.  */);
 
+  DEFVAR_PER_BUFFER ("group-fold-search", &BVAR (current_buffer, group_fold_search),
+		     Qnil,
+		     doc: /* Non-nil if searches and matches should ignore case.  */);
+
   DEFVAR_PER_BUFFER ("fill-column", &BVAR (current_buffer, fill_column),
 		     Qintegerp,
 		     doc: /* Column beyond which automatic line-wrapping should happen.
diff --git a/src/buffer.h b/src/buffer.h
index 81852ca..ff56a81 100644
--- a/src/buffer.h
+++ b/src/buffer.h
@@ -558,6 +558,7 @@ struct buffer
   /* tab-width is buffer-local so that redisplay can find it
      in buffers that are not current.  */
   Lisp_Object INTERNAL_FIELD (case_fold_search);
+  Lisp_Object INTERNAL_FIELD (group_fold_search);
   Lisp_Object INTERNAL_FIELD (tab_width);
   Lisp_Object INTERNAL_FIELD (fill_column);
   Lisp_Object INTERNAL_FIELD (left_margin);
@@ -578,6 +579,9 @@ struct buffer
   /* Char-table of equivalences for case-folding search.  */
   Lisp_Object INTERNAL_FIELD (case_eqv_table);
 
+  /* Char-table for conversion for group-folding search.  */
+  Lisp_Object INTERNAL_FIELD (group_canon_table);
+
   /* Non-nil means do not display continuation lines.  */
   Lisp_Object INTERNAL_FIELD (truncate_lines);
 
@@ -899,6 +903,11 @@ bset_case_eqv_table (struct buffer *b, Lisp_Object val)
   b->INTERNAL_FIELD (case_eqv_table) = val;
 }
 INLINE void
+bset_group_canon_table (struct buffer *b, Lisp_Object val)
+{
+  b->INTERNAL_FIELD (group_canon_table) = val;
+}
+INLINE void
 bset_directory (struct buffer *b, Lisp_Object val)
 {
   b->INTERNAL_FIELD (directory) = val;
diff --git a/src/casetab.c b/src/casetab.c
index b086abc..81a6476 100644
--- a/src/casetab.c
+++ b/src/casetab.c
@@ -63,6 +63,13 @@ check_case_table (Lisp_Object obj)
   return (obj);
 }
 
+DEFUN ("current-group-table", Fcurrent_group_table, Scurrent_group_table, 0, 0, 0,
+       doc: /* Return the group table of the current buffer.  */)
+  (void)
+{
+  return BVAR (current_buffer, group_canon_table);
+}
+
 DEFUN ("current-case-table", Fcurrent_case_table, Scurrent_case_table, 0, 0, 0,
        doc: /* Return the case table of the current buffer.  */)
   (void)
diff --git a/src/editfns.c b/src/editfns.c
index 7026ccc..9fe6bc6 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -2805,8 +2805,10 @@ determines whether case is significant or ignored.  */)
   register EMACS_INT begp1, endp1, begp2, endp2, temp;
   register struct buffer *bp1, *bp2;
   register Lisp_Object trt
-    = (!NILP (BVAR (current_buffer, case_fold_search))
-       ? BVAR (current_buffer, case_canon_table) : Qnil);
+    = (!NILP (BVAR (current_buffer, group_fold_search))
+       ? BVAR (current_buffer, group_canon_table) :
+       (!NILP (BVAR (current_buffer, case_fold_search))
+        ? BVAR (current_buffer, case_canon_table) : Qnil));
   ptrdiff_t chars = 0;
   ptrdiff_t i1, i2, i1_byte, i2_byte;
 
diff --git a/src/search.c b/src/search.c
index e961798..037f409 100644
--- a/src/search.c
+++ b/src/search.c
@@ -281,8 +281,10 @@ looking_at_1 (Lisp_Object string, bool posix)
   bufp = compile_pattern (string,
 			  (NILP (Vinhibit_changing_match_data)
 			   ? &search_regs : NULL),
-			  (!NILP (BVAR (current_buffer, case_fold_search))
-			   ? BVAR (current_buffer, case_canon_table) : Qnil),
+			  (!NILP (BVAR (current_buffer, group_fold_search))
+			   ? BVAR (current_buffer, group_canon_table) :
+                           (!NILP (BVAR (current_buffer, case_fold_search))
+                            ? BVAR (current_buffer, case_canon_table) : Qnil)),
 			  posix,
 			  !NILP (BVAR (current_buffer, enable_multibyte_characters)));
 
@@ -396,8 +398,10 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
   bufp = compile_pattern (regexp,
 			  (NILP (Vinhibit_changing_match_data)
 			   ? &search_regs : NULL),
-			  (!NILP (BVAR (current_buffer, case_fold_search))
-			   ? BVAR (current_buffer, case_canon_table) : Qnil),
+			  (!NILP (BVAR (current_buffer, group_fold_search))
+			   ? BVAR (current_buffer, group_canon_table) :
+                           (!NILP (BVAR (current_buffer, case_fold_search))
+                            ? BVAR (current_buffer, case_canon_table) : Qnil)),
 			  posix,
 			  STRING_MULTIBYTE (string));
   immediate_quit = 1;
@@ -1052,12 +1056,15 @@ search_command (Lisp_Object string, Lisp_Object bound, Lisp_Object noerror,
 			 BVAR (current_buffer, case_eqv_table));
 
   np = search_buffer (string, PT, PT_BYTE, lim, lim_byte, n, RE,
-		      (!NILP (BVAR (current_buffer, case_fold_search))
-		       ? BVAR (current_buffer, case_canon_table)
-		       : Qnil),
-		      (!NILP (BVAR (current_buffer, case_fold_search))
-		       ? BVAR (current_buffer, case_eqv_table)
-		       : Qnil),
+                      (!NILP (BVAR (current_buffer, group_fold_search))
+                       ? BVAR (current_buffer, group_canon_table) :
+                       (!NILP (BVAR (current_buffer, case_fold_search))
+                        ? BVAR (current_buffer, case_canon_table) : Qnil)),
+                      (!NILP (BVAR (current_buffer, group_fold_search))
+                       ? Qnil :
+                       (!NILP (BVAR (current_buffer, case_fold_search))
+                        ? BVAR (current_buffer, case_eqv_table)
+                        : Qnil)),
 		      posix);
   if (np <= 0)
     {
-- 
2.2.2


[-- Attachment #3: group-folding-with-regexp-lisp.patch --]
[-- Type: text/x-patch, Size: 3231 bytes --]

From f67ae7ed53e6a90cf4f97ac1bba9498b5d58e6dc Mon Sep 17 00:00:00 2001
From: Artur Malabarba <bruce.connor.am@gmail.com>
Date: Tue, 27 Jan 2015 14:08:01 -0200
Subject: [PATCH] (isearch-search-fun-default): Implement group folding in
 isearch.

Use `isearch-fold-groups', `isearch-groups-alist', and
`isearch--replace-groups-in-string'.
---
 lisp/isearch.el | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/lisp/isearch.el b/lisp/isearch.el
index 99ca73f..accfb72 100644
--- a/lisp/isearch.el
+++ b/lisp/isearch.el
@@ -58,6 +58,7 @@
 ;;; Code:
 
 (eval-when-compile (require 'cl-lib))
+(eval-when-compile (require 'subr-x))
 \f
 ;; Some additional options and constants.
 
@@ -272,6 +273,23 @@ Default value, nil, means edit the string instead."
   :version "23.1"
   :group 'isearch)
 
+(defcustom isearch-fold-groups t
+  "Whether regular isearch should do group folding.
+This means some characters will match entire groups of charactes,
+such as \" matching ”, for instance."
+  :type 'boolean
+  :group 'isearch
+  :version "25.1")
+
+(defvar isearch-groups-alist
+  ;; FIXME: Add all the latin accented letters like Ã.
+  '((?\" . "[\""“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸❮❯«»‹›󠀢]")
+    (?' . "[`'❟❛❜‘’’‚‛‛‚‘󠀢❮❯«»‹›]")
+    ;; `isearch-fold-groups' doesn't interact with
+    ;; `isearch-lax-whitespace' yet.  So we need to add this here.
+    (?\s . "[ 	\r\n]+"))
+  "Alist of groups to use when `isearch-fold-groups' is non-nil.")
+
 (defcustom isearch-lazy-highlight t
   "Controls the lazy-highlighting during incremental search.
 When non-nil, all text in the buffer matching the current search
@@ -2565,6 +2583,18 @@ search for the first occurrence of STRING or its translation.")
 Can be changed via `isearch-search-fun-function' for special needs."
   (funcall (or isearch-search-fun-function 'isearch-search-fun-default)))
 
+(defun isearch--replace-groups-in-string (string)
+  "Return a group-folded regexp version of STRING.
+Any character that has an entry in `isearch-groups-alist' is
+replaced with the cdr of that entry (which should be a regexp).
+Other characters are `regexp-quote'd."
+  (apply #'concat
+    (mapcar (lambda (c)
+              (if-let ((entry (assq c isearch-groups-alist)))
+                  (cdr entry)
+                (regexp-quote (string c))))
+      string)))
+
 (defun isearch-search-fun-default ()
   "Return default functions to use for the search."
   (cond
@@ -2591,6 +2621,13 @@ Can be changed via `isearch-search-fun-function' for special needs."
       're-search-backward-lax-whitespace))
    (isearch-regexp
     (if isearch-forward 're-search-forward 're-search-backward))
+   ;; `isearch-regexp' is essentially a superset of
+   ;; `isearch-fold-groups'.  So fold-groups comes after it.
+   (isearch-fold-groups
+    (lambda (string &optional bound noerror count)
+      (funcall (if isearch-forward #'re-search-forward #'re-search-backward)
+        (isearch--replace-groups-in-string string)
+        bound noerror count)))
    ((and isearch-lax-whitespace search-whitespace-regexp)
     (if isearch-forward
 	'search-forward-lax-whitespace
-- 
2.2.2


[-- Attachment #4: group-folding-with-case-table-lisp.patch --]
[-- Type: text/x-patch, Size: 2549 bytes --]

From 8f4be27dca714b168414171bde3eeee9fefc44e9 Mon Sep 17 00:00:00 2001
From: Artur Malabarba <bruce.connor.am@gmail.com>
Date: Tue, 27 Jan 2015 14:08:01 -0200
Subject: [PATCH] (isearch-search-fun-default): Implement group folding in
 isearch.

(isearch-fold-groups group-fold-table): New variables.

When `isearch-fold-groups' is non-nil `group-fold-table' is used as the
case table.
---
 lisp/isearch.el | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/lisp/isearch.el b/lisp/isearch.el
index 99ca73f..7d568dd 100644
--- a/lisp/isearch.el
+++ b/lisp/isearch.el
@@ -272,6 +272,38 @@ Default value, nil, means edit the string instead."
   :version "23.1"
   :group 'isearch)
 
+(defcustom isearch-fold-groups t
+  "Whether regular isearch should do group folding.
+This means some characters will match entire groups of charactes,
+such as \" matching ”, for instance."
+  :type 'boolean
+  :group 'isearch
+  :version "25.1")
+
+(defvar group-fold-table
+  (eval-when-compile
+    (let ((table (make-char-table 'case-table))
+          (eq (make-char-table 'equiv)))
+      (require 'subr-x)
+      ;; Build the group table.
+      (dotimes (i (length eq))
+        (when-let ((d (get-char-code-property i 'decomposition))
+                   (k (car-safe d)))
+          (unless (eq i k)
+            (aset eq i (if (characterp k) k (cadr d))))))
+      ;; Put it in the right place.
+      (set-char-table-extra-slot table 1 eq)
+      table))
+  "Used for folding characters of the same group during search.")
+
+(defmacro with-group-folding (&rest body)
+  "Execute BODY with character-group folding turned on.
+This sets `group-fold-table' as the case-table during the
+execution of BODY."
+  `(let ((case-fold-search t))
+     (with-case-table group-fold-table
+       ,@body)))
+
 (defcustom isearch-lazy-highlight t
   "Controls the lazy-highlighting during incremental search.
 When non-nil, all text in the buffer matching the current search
@@ -2568,6 +2600,12 @@ Can be changed via `isearch-search-fun-function' for special needs."
 (defun isearch-search-fun-default ()
   "Return default functions to use for the search."
   (cond
+   (isearch-fold-groups
+    (lambda (&rest args)
+      (let* ((isearch-fold-groups nil)
+             (function (isearch-search-fun-default)))
+        (with-group-folding
+         (apply function args)))))
    (isearch-word
     (lambda (string &optional bound noerror count)
       ;; Use lax versions to not fail at the end of the word while
-- 
2.2.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-02-10 15:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18   ` Artur Malabarba
2015-02-06 16:44     ` Eli Zaretskii
2015-02-06 18:03   ` Stefan Monnier
2015-02-06 19:03     ` Eli Zaretskii
2015-02-06 19:27       ` Artur Malabarba
2015-02-06 21:38         ` Eli Zaretskii
2015-02-06 22:08           ` Artur Malabarba
2015-02-07  8:38             ` Eli Zaretskii
2015-02-06 19:41       ` Stefan Monnier
2015-02-06 21:43         ` Eli Zaretskii
2015-02-07  0:05           ` Stefan Monnier
2015-02-07  8:47             ` Eli Zaretskii
2015-02-07 15:02               ` Stefan Monnier
2015-02-07 15:31                 ` Eli Zaretskii
2015-02-08 14:03                   ` Stefan Monnier
2015-02-08 19:12                     ` Eli Zaretskii
2015-02-09  3:03                       ` Stefan Monnier
2015-02-09 15:40                         ` Eli Zaretskii
2015-02-09 16:33                           ` Stefan Monnier
2015-02-09 17:39                             ` Eli Zaretskii
2015-02-10  2:15                               ` Stefan Monnier
2015-02-10 15:45                                 ` Eli Zaretskii
2015-02-07  0:07 ` Juri Linkov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).