unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#40216: 28.0.50; Misinformation in isearch char-fold
@ 2020-03-24 23:00 Juri Linkov
  2020-03-25  9:22 ` Robert Pluim
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Linkov @ 2020-03-24 23:00 UTC (permalink / raw)
  To: 40216

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

Tags: patch

When the size of the generated regexp in char-fold isearch mode reaches
a certain limit, it silently falls back to literal search without notifying
the user about this fact.  Thus uninformed users might miss some search hits.

Here is the patch that instead of returning a quoted string in
char-fold-to-regexp when it reaches some arbitrary limit,
instead of this it toggles the literal search mode explicitly,
tries to find the next occurrence in literal mode, and displays
the message about switching search mode for 2 seconds:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: isearch-literal-char-fold.patch --]
[-- Type: text/x-diff, Size: 2088 bytes --]

diff --git a/lisp/char-fold.el b/lisp/char-fold.el
index f8a303956e..34561a2efe 100644
--- a/lisp/char-fold.el
+++ b/lisp/char-fold.el
@@ -370,11 +377,7 @@ char-fold-to-regexp
       (setq i (1+ i)))
     (when (> spaces 0)
       (push (char-fold--make-space-string spaces) out))
-    (let ((regexp (apply #'concat (nreverse out))))
-      ;; Limited by `MAX_BUF_SIZE' in `regex-emacs.c'.
-      (if (> (length regexp) 5000)
-          (regexp-quote string)
-        regexp))))
+    (apply #'concat (nreverse out))))
 
 \f
 ;;; Commands provided for completeness.
diff --git a/lisp/isearch.el b/lisp/isearch.el
index ddf9190dc6..7625ec12b5 100644
--- a/lisp/isearch.el
+++ b/lisp/isearch.el
@@ -2011,15 +2011,16 @@ regexp
 (defvar isearch-message-properties minibuffer-prompt-properties
   "Text properties that are added to the isearch prompt.")
 
-(defun isearch--momentary-message (string)
-  "Print STRING at the end of the isearch prompt for 1 second."
+(defun isearch--momentary-message (string &optional seconds)
+  "Print STRING at the end of the isearch prompt for 1 second.
+The optional argument SECONDS overrides the number of seconds."
   (let ((message-log-max nil))
     (message "%s%s%s"
              (isearch-message-prefix nil isearch-nonincremental)
              isearch-message
              (apply #'propertize (format " [%s]" string)
                     isearch-message-properties)))
-  (sit-for 1))
+  (sit-for (or seconds 1)))
 
 (isearch-define-mode-toggle lax-whitespace " " nil
   "In ordinary search, toggles the value of the variable
@@ -3443,7 +3444,10 @@ isearch-search
 	    (string-match "\\`Regular expression too big" isearch-error))
        (cond
 	(isearch-regexp-function
-	 (setq isearch-error "Too many words"))
+         (setq isearch-error nil)
+         (setq isearch-regexp-function nil)
+         (isearch-search-and-update)
+         (isearch--momentary-message "Too many words; switched to literal mode" 2))
 	((and isearch-lax-whitespace search-whitespace-regexp)
 	 (setq isearch-error "Too many spaces for whitespace matching"))))))
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-24 23:00 bug#40216: 28.0.50; Misinformation in isearch char-fold Juri Linkov
@ 2020-03-25  9:22 ` Robert Pluim
  2020-03-25 20:29   ` Juri Linkov
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Pluim @ 2020-03-25  9:22 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 40216

>>>>> On Wed, 25 Mar 2020 01:00:18 +0200, Juri Linkov <juri@linkov.net> said:

    Juri> Tags: patch
    Juri> When the size of the generated regexp in char-fold isearch mode reaches
    Juri> a certain limit, it silently falls back to literal search without notifying
    Juri> the user about this fact.  Thus uninformed users might miss some search hits.

Out of curiosity, what were you searching for that resulted in such a
large regexp?

Robert





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-25  9:22 ` Robert Pluim
@ 2020-03-25 20:29   ` Juri Linkov
  2020-03-26  9:28     ` Robert Pluim
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Linkov @ 2020-03-25 20:29 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 40216

> Out of curiosity, what were you searching for that resulted in such a
> large regexp?

Sometimes I pull a few of lines (usually 1-3 lines, not more)
from the buffer into the search string to confirm that the same lines
exist in more places in the same buffer ignoring the differences defined
by folding rules.  But after pulling 2 lines into the search string,
the generated regexp becomes so long that the regexp search fails
with the error "Regular expression too big".  Currently it silently
switches to literal search without notification that it doesn't follow
the folding rules anymore.  With the patch it informs about switching
to literal search.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-25 20:29   ` Juri Linkov
@ 2020-03-26  9:28     ` Robert Pluim
  2020-03-26 14:25       ` Eli Zaretskii
  2020-03-26 23:00       ` Juri Linkov
  0 siblings, 2 replies; 11+ messages in thread
From: Robert Pluim @ 2020-03-26  9:28 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 40216

>>>>> On Wed, 25 Mar 2020 22:29:29 +0200, Juri Linkov <juri@linkov.net> said:

    >> Out of curiosity, what were you searching for that resulted in such a
    >> large regexp?

    Juri> Sometimes I pull a few of lines (usually 1-3 lines, not more)
    Juri> from the buffer into the search string to confirm that the same lines
    Juri> exist in more places in the same buffer ignoring the differences defined
    Juri> by folding rules.  But after pulling 2 lines into the search string,
    Juri> the generated regexp becomes so long that the regexp search fails
    Juri> with the error "Regular expression too big".  Currently it silently
    Juri> switches to literal search without notification that it doesn't follow
    Juri> the folding rules anymore.  With the patch it informs about switching
    Juri> to literal search.

Ah, I hadn't considered that use case. Do we need an option to
char-fold-regexp that says 'only apply char-folding to non-ascii
characters'? That would reduce the size of the regexp considerably.

Robert





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-26  9:28     ` Robert Pluim
@ 2020-03-26 14:25       ` Eli Zaretskii
  2020-03-26 15:43         ` Robert Pluim
  2020-03-26 23:00       ` Juri Linkov
  1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2020-03-26 14:25 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 40216, juri

> From: Robert Pluim <rpluim@gmail.com>
> Date: Thu, 26 Mar 2020 10:28:55 +0100
> Cc: 40216@debbugs.gnu.org
> 
> Do we need an option to char-fold-regexp that says 'only apply
> char-folding to non-ascii characters'?

But this feature is not intended only to find variants of non-ASCII
characters when one searches for a non-ASCII, it is also intended to
find variants when searching for ASCII characters.  For example,
searching for a is supposed to find ä and à and á.  Or am I missing
something?





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-26 14:25       ` Eli Zaretskii
@ 2020-03-26 15:43         ` Robert Pluim
  2020-03-26 23:04           ` Juri Linkov
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Pluim @ 2020-03-26 15:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 40216, juri

>>>>> On Thu, 26 Mar 2020 16:25:14 +0200, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Date: Thu, 26 Mar 2020 10:28:55 +0100
    >> Cc: 40216@debbugs.gnu.org
    >> 
    >> Do we need an option to char-fold-regexp that says 'only apply
    >> char-folding to non-ascii characters'?

    Eli> But this feature is not intended only to find variants of non-ASCII
    Eli> characters when one searches for a non-ASCII, it is also intended to
    Eli> find variants when searching for ASCII characters.  For example,
    Eli> searching for a is supposed to find ä and à and á.  Or am I missing
    Eli> something?

Yes, thatʼs exactly right. But in the case where you have mainly
characters where you donʼt want case-folding, it might make sense to
restrict the folding to non-ascii as an optimisation. eg. Suppose my
name were Røbert, with people frequently misspelling it as Robert, I
might want isearch to just search for "R\\(?:ǿ\\|[øǿo]\\)bert"

Røbert





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-26  9:28     ` Robert Pluim
  2020-03-26 14:25       ` Eli Zaretskii
@ 2020-03-26 23:00       ` Juri Linkov
  1 sibling, 0 replies; 11+ messages in thread
From: Juri Linkov @ 2020-03-26 23:00 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 40216

> Ah, I hadn't considered that use case. Do we need an option to
> char-fold-regexp that says 'only apply char-folding to non-ascii
> characters'? That would reduce the size of the regexp considerably.

Currently there are 2 covered use cases:

1. the default is to fold ascii to non-ascii characters;

2. non-nil char-fold-symmetric additionally folds
   non-ascii to ascii characters.

It seems you are proposing a third use case:

3. symmetric-only that can be implemented with a new non-nil option
   char-fold-symmetric-only that will fold only non-ascii characters
   to ascii.

I have doubts how useful this will be.

The current default behavior is useful when the user types
ascii characters on the keyboard with ascii characters only.

The option char-fold-symmetric is useful to match pasted text
both ways ignoring all differences between ascii/non-ascii characters.

But for symmetric-only I can't imagine any useful use case.
For example, when you paste non-ascii characters into the search string,
and want to find corresponding ascii characters.  But why wouldn't you
want to find the other way around: pasting ascii characters
to find non-ascii counterparts?





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-26 15:43         ` Robert Pluim
@ 2020-03-26 23:04           ` Juri Linkov
  2020-03-27  7:24             ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Linkov @ 2020-03-26 23:04 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 40216

>     >> Do we need an option to char-fold-regexp that says 'only apply
>     >> char-folding to non-ascii characters'?
>
>     Eli> But this feature is not intended only to find variants of non-ASCII
>     Eli> characters when one searches for a non-ASCII, it is also intended to
>     Eli> find variants when searching for ASCII characters.  For example,
>     Eli> searching for a is supposed to find ä and à and á.  Or am I missing
>     Eli> something?
>
> Yes, thatʼs exactly right. But in the case where you have mainly
> characters where you donʼt want case-folding, it might make sense to
> restrict the folding to non-ascii as an optimisation. eg. Suppose my
> name were Røbert, with people frequently misspelling it as Robert, I
> might want isearch to just search for "R\\(?:ǿ\\|[øǿo]\\)bert"

I tried to find Røbert by typing Robert, but char-fold fails to find it.
A bug in char-fold?





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-26 23:04           ` Juri Linkov
@ 2020-03-27  7:24             ` Eli Zaretskii
  2020-03-27  8:30               ` Robert Pluim
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2020-03-27  7:24 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 40216, rpluim

> From: Juri Linkov <juri@linkov.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  40216@debbugs.gnu.org
> Date: Fri, 27 Mar 2020 01:04:12 +0200
> 
> I tried to find Røbert by typing Robert, but char-fold fails to find it.
> A bug in char-fold?

I don't think it's a bug, because ø doesn't have a decomposition in
the Unicode character database:

   (get-char-code-property ?ø 'decomposition) => (248)

(i.e. the character "decomposes" into itself).  By contrast:

   (get-char-code-property ?á 'decomposition) => (97 769)

(i.e. á decomposes into a followed by U+0301 COMBINING ACUTE ACCENT).

So if one wants to support the kind of folding you expected, one would
have to customize char-fold-include to add those additional rules.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-27  7:24             ` Eli Zaretskii
@ 2020-03-27  8:30               ` Robert Pluim
  2020-03-28 23:42                 ` Juri Linkov
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Pluim @ 2020-03-27  8:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 40216, Juri Linkov

>>>>> On Fri, 27 Mar 2020 10:24:27 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Juri Linkov <juri@linkov.net>
    >> Cc: Eli Zaretskii <eliz@gnu.org>,  40216@debbugs.gnu.org
    >> Date: Fri, 27 Mar 2020 01:04:12 +0200
    >> 
    >> I tried to find Røbert by typing Robert, but char-fold fails to find it.
    >> A bug in char-fold?

    Eli> I don't think it's a bug, because ø doesn't have a decomposition in
    Eli> the Unicode character database:

    Eli>    (get-char-code-property ?ø 'decomposition) => (248)

    Eli> (i.e. the character "decomposes" into itself).  By contrast:

    Eli>    (get-char-code-property ?á 'decomposition) => (97 769)

    Eli> (i.e. á decomposes into a followed by U+0301 COMBINING ACUTE ACCENT).

    Eli> So if one wants to support the kind of folding you expected, one would
    Eli> have to customize char-fold-include to add those additional rules.

Yes, wrong example. I guess this wouldnʼt be useful after all (and I
see nothing wrong with Juri's proposed fix to the actual issue).

Robert





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#40216: 28.0.50; Misinformation in isearch char-fold
  2020-03-27  8:30               ` Robert Pluim
@ 2020-03-28 23:42                 ` Juri Linkov
  0 siblings, 0 replies; 11+ messages in thread
From: Juri Linkov @ 2020-03-28 23:42 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 40216

tags 40216 fixed
close 40216 28.0.50
quit

>     Eli> So if one wants to support the kind of folding you expected, one would
>     Eli> have to customize char-fold-include to add those additional rules.
>
> Yes, wrong example. I guess this wouldnʼt be useful after all (and

Thanks for pointing out a possibility to optimize char-fold,
I haven't thought about this before.  But it seems this optimization
limits the usability of char-fold since matching non-ascii characters
on ascii text is not needed as often as matching ascii on non-ascii text,
or both ways.  Even the current default of folding ascii to non-ascii
is so useless for me that I have to enable char-fold-symmetric.

> I see nothing wrong with Juri's proposed fix to the actual issue).

So now pushed to master.





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-28 23:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-24 23:00 bug#40216: 28.0.50; Misinformation in isearch char-fold Juri Linkov
2020-03-25  9:22 ` Robert Pluim
2020-03-25 20:29   ` Juri Linkov
2020-03-26  9:28     ` Robert Pluim
2020-03-26 14:25       ` Eli Zaretskii
2020-03-26 15:43         ` Robert Pluim
2020-03-26 23:04           ` Juri Linkov
2020-03-27  7:24             ` Eli Zaretskii
2020-03-27  8:30               ` Robert Pluim
2020-03-28 23:42                 ` Juri Linkov
2020-03-26 23:00       ` Juri Linkov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).