unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Phil Sainty <psainty@orcon.net.nz>, 37659@debbugs.gnu.org
Subject: bug#37659: rx additions: anychar, unmatchable, unordered-or
Date: Thu, 13 Feb 2020 20:16:28 +0100	[thread overview]
Message-ID: <4942C948-4FF4-4E47-B713-F8D6D60F8A84@acm.org> (raw)
In-Reply-To: <6aa5c232-e3df-751a-a7dc-30d430924385@cs.ucla.edu>

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

13 feb. 2020 kl. 19.50 skrev Paul Eggert <eggert@cs.ucla.edu>:

> Simplest would be to remove it in Emacs 27, and merge that change into master.

Right, introducing it already deprecated doesn't make sense at all. I'll do it in emacs-27 if Eli doesn't mind too much (patch attached).


[-- Attachment #2: 0001-Remove-the-optional-KEEP-ORDER-argument-to-regexp-op.patch --]
[-- Type: application/octet-stream, Size: 8920 bytes --]

From 79ecef631e667650418b187143c4059d7e6eebe6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 13 Feb 2020 20:06:48 +0100
Subject: [PATCH] Remove the optional KEEP-ORDER argument to regexp-opt

This argument was added for the 'or' clause in rx, but it turned out
to be a bad idea (bug#37659), and there seems to be little other use
for it.

* lisp/emacs-lisp/regexp-opt.el (regexp-opt): Remove KEEP-ORDER.
* doc/lispref/searching.texi (Regexp Functions):
* etc/NEWS: Remove it from the documentation.
* test/lisp/emacs-lisp/regexp-opt-tests.el (regexp-opt-test--match-all)
(regexp-opt-test--check-perm, regexp-opt-test--explain-perm)
(regexp-opt-keep-order, regexp-opt-longest-match): Simplify test.
---
 doc/lispref/searching.texi               |  9 ++---
 etc/NEWS                                 |  7 ----
 lisp/emacs-lisp/regexp-opt.el            | 43 +++++------------------
 test/lisp/emacs-lisp/regexp-opt-tests.el | 44 ++++--------------------
 4 files changed, 19 insertions(+), 84 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 5f4509a8b4..1f6db0643e 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1745,7 +1745,7 @@ Regexp Functions
 @end defun
 
 @cindex optimize regexp
-@defun regexp-opt strings &optional paren keep-order
+@defun regexp-opt strings &optional paren
 This function returns an efficient regular expression that will match
 any of the strings in the list @var{strings}.  This is useful when you
 need to make matching or searching as fast as possible---for example,
@@ -1783,11 +1783,8 @@ Regexp Functions
 it will apply to the whole expression.
 @end table
 
-The optional argument @var{keep-order}, if non-@code{nil}, forces the
-match to be performed in the order given, as if the strings were made
-into a regexp by joining them with the @samp{\|} operator.  If nil or
-omitted, the returned regexp will always match the longest string
-possible.
+The returned regexp is ordered in such a way that it will always match
+the longest string possible.
 
 Up to reordering, the resulting regexp of @code{regexp-opt} is
 equivalent to but usually more efficient than that of a simplified
diff --git a/etc/NEWS b/etc/NEWS
index 312869fe57..380ac71260 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3526,13 +3526,6 @@ the process.  That way 'make-process' can start remote processes.
 This is currently supported on GNUish hosts and on modern versions of
 MS-Windows.
 
-+++
-** The function 'regexp-opt' accepts an additional optional argument.
-By default, the regexp returned by 'regexp-opt' may match the strings
-in any order.  If the new third argument is non-nil, the match is
-guaranteed to be performed in the order given, as if the strings were
-made into a regexp by joining them with '\|'.
-
 +++
 ** The function 'regexp-opt', when given an empty list of strings, now
 returns a regexp that never matches anything, which is an identity for
diff --git a/lisp/emacs-lisp/regexp-opt.el b/lisp/emacs-lisp/regexp-opt.el
index 2cce4e6353..35a5fda184 100644
--- a/lisp/emacs-lisp/regexp-opt.el
+++ b/lisp/emacs-lisp/regexp-opt.el
@@ -84,7 +84,7 @@
 ;;; Code:
 
 ;;;###autoload
-(defun regexp-opt (strings &optional paren keep-order)
+(defun regexp-opt (strings &optional paren)
   "Return a regexp to match a string in the list STRINGS.
 Each member of STRINGS is treated as a fixed string, not as a regexp.
 Optional PAREN specifies how the returned regexp is surrounded by
@@ -114,11 +114,8 @@ regexp-opt
     necessary to ensure that a postfix operator appended to it will
     apply to the whole expression.
 
-The optional argument KEEP-ORDER, if non-nil, forces the match to
-be performed in the order given, as if the strings were made into
-a regexp by joining them with the `\\|' operator.  If nil or
-omitted, the returned regexp is will always match the longest
-string possible.
+The returned regexp is ordered in such a way that it will always
+match the longest string possible.
 
 Up to reordering, the resulting regexp is equivalent to but
 usually more efficient than that of a simplified version:
@@ -140,34 +137,12 @@ regexp-opt
 	   (completion-ignore-case nil)
 	   (completion-regexp-list nil)
 	   (open (cond ((stringp paren) paren) (paren "\\(")))
-	   (re
-            (cond
-             ;; No strings: return an unmatchable regexp.
-             ((null strings)
-              (concat (or open "\\(?:") regexp-unmatchable "\\)"))
-
-             ;; The algorithm will generate a pattern that matches
-             ;; longer strings in the list before shorter.  If the
-             ;; list order matters, then no string must come after a
-             ;; proper prefix of that string.  To check this, verify
-             ;; that a straight or-pattern matches each string
-             ;; entirely.
-             ((and keep-order
-                   (let* ((case-fold-search nil)
-                          (alts (mapconcat #'regexp-quote strings "\\|")))
-                     (and (let ((s strings))
-                            (while (and s
-                                        (string-match alts (car s))
-                                        (= (match-end 0) (length (car s))))
-                              (setq s (cdr s)))
-                            ;; If we exited early, we found evidence that
-                            ;; regexp-opt-group cannot be used.
-                            s)
-                          (concat (or open "\\(?:") alts "\\)")))))
-             (t
-              (regexp-opt-group
-               (delete-dups (sort (copy-sequence strings) 'string-lessp))
-               (or open t) (not open))))))
+	   (re (if strings
+                   (regexp-opt-group
+                    (delete-dups (sort (copy-sequence strings) 'string-lessp))
+                    (or open t) (not open))
+                 ;; No strings: return an unmatchable regexp.
+                 (concat (or open "\\(?:") regexp-unmatchable "\\)"))))
       (cond ((eq paren 'words)
 	     (concat "\\<" re "\\>"))
 	    ((eq paren 'symbols)
diff --git a/test/lisp/emacs-lisp/regexp-opt-tests.el b/test/lisp/emacs-lisp/regexp-opt-tests.el
index 9b4567c72c..0179ac4f1f 100644
--- a/test/lisp/emacs-lisp/regexp-opt-tests.el
+++ b/test/lisp/emacs-lisp/regexp-opt-tests.el
@@ -47,43 +47,13 @@ regexp-opt-test--permutations
   (mapcar (lambda (i) (regexp-opt-test--permutation i list))
           (number-sequence 0 (1- (regexp-opt-test--factorial (length list))))))
 
-(defun regexp-opt-test--match-all (words re)
-  (mapcar (lambda (w) (and (string-match re w)
-                           (match-string 0 w)))
-          words))
-
-(defun regexp-opt-test--check-perm (perm)
-  (let* ((ref-re (mapconcat #'regexp-quote perm "\\|"))
-         (opt-re (regexp-opt perm nil t))
-         (ref (regexp-opt-test--match-all perm ref-re))
-         (opt (regexp-opt-test--match-all perm opt-re)))
-    (equal opt ref)))
-
-(defun regexp-opt-test--explain-perm (perm)
-  (let* ((ref-re (mapconcat #'regexp-quote perm "\\|"))
-         (opt-re (regexp-opt perm nil t))
-         (ref (regexp-opt-test--match-all perm ref-re))
-         (opt (regexp-opt-test--match-all perm opt-re)))
-    (concat "\n"
-            (format "Naïve regexp:     %s\n" ref-re)
-            (format "Optimized regexp: %s\n" opt-re)
-            (format "Got:      %s\n" opt)
-            (format "Expected: %s\n" ref))))
-
-(put 'regexp-opt-test--check-perm 'ert-explainer 'regexp-opt-test--explain-perm)
-
-(ert-deftest regexp-opt-keep-order ()
-  "Check that KEEP-ORDER works."
-  (dolist (perm (regexp-opt-test--permutations '("abc" "bca" "cab")))
-    (should (regexp-opt-test--check-perm perm)))
-  (dolist (perm (regexp-opt-test--permutations '("abc" "ab" "bca" "bc")))
-    (should (regexp-opt-test--check-perm perm)))
-  (dolist (perm (regexp-opt-test--permutations '("abxy" "cdxy")))
-    (should (regexp-opt-test--check-perm perm)))
-  (dolist (perm (regexp-opt-test--permutations '("afgx" "bfgx" "afgy" "bfgy")))
-    (should (regexp-opt-test--check-perm perm)))
-  (dolist (perm (regexp-opt-test--permutations '("a" "ab" "ac" "abc")))
-    (should (regexp-opt-test--check-perm perm))))
+(ert-deftest regexp-opt-longest-match ()
+  "Check that the regexp always matches as much as possible."
+  (let ((s "abcd"))
+    (dolist (perm (regexp-opt-test--permutations '("a" "ab" "ac" "abc")))
+      (should (equal (and (string-match (regexp-opt perm) s)
+                          (match-string 0 s))
+                     "abc")))))
 
 (ert-deftest regexp-opt-charset ()
   (should (equal (regexp-opt-charset '(?a ?b ?a)) "[ab]"))
-- 
2.21.1 (Apple Git-122.3)


  reply	other threads:[~2020-02-13 19:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:36 bug#37659: rx additions: anychar, unmatchable, unordered-or Mattias Engdegård
2019-10-09  8:59 ` Mattias Engdegård
2019-10-11 23:07 ` bug#37659: Mattias Engdegård <mattiase <at> acm.org> Paul Eggert
2019-10-12 10:47   ` Mattias Engdegård
2019-10-13 16:52     ` Paul Eggert
2019-10-13 19:48       ` Mattias Engdegård
2019-10-22 15:14       ` bug#37659: rx additions: anychar, unmatchable, unordered-or Mattias Engdegård
2019-10-22 15:27         ` Robert Pluim
2019-10-22 17:33         ` Paul Eggert
2019-10-23  9:15           ` Mattias Engdegård
2019-10-23 23:14             ` Paul Eggert
2019-10-24  1:56               ` Drew Adams
2019-10-24  9:09                 ` Mattias Engdegård
2019-10-24 14:24                   ` Drew Adams
2019-10-24  9:17                 ` Phil Sainty
2019-10-24 14:32                   ` Drew Adams
2019-10-24  8:58               ` Mattias Engdegård
2019-10-27 11:53                 ` Mattias Engdegård
2020-02-11 12:57           ` Mattias Engdegård
2020-02-11 15:43             ` Eli Zaretskii
2020-02-11 19:17               ` Mattias Engdegård
2020-02-12  0:52                 ` Paul Eggert
2020-02-12 11:22                   ` Mattias Engdegård
2020-02-13 18:38                     ` Mattias Engdegård
2020-02-13 18:50                       ` Paul Eggert
2020-02-13 19:16                         ` Mattias Engdegård [this message]
2020-02-13 19:30                           ` Eli Zaretskii
2020-02-13 22:23                             ` Mattias Engdegård
2020-02-14  7:45                               ` Eli Zaretskii
2020-02-14 16:15                                 ` Paul Eggert
2020-02-14 20:49                                   ` Mattias Engdegård
2020-03-01 10:09                                   ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4942C948-4FF4-4E47-B713-F8D6D60F8A84@acm.org \
    --to=mattiase@acm.org \
    --cc=37659@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=psainty@orcon.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).