From: "Mattias Engdegård" <mattiase@acm.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Phil Sainty <psainty@orcon.net.nz>, 37659@debbugs.gnu.org
Subject: bug#37659: rx additions: anychar, unmatchable, unordered-or
Date: Thu, 13 Feb 2020 20:16:28 +0100 [thread overview]
Message-ID: <4942C948-4FF4-4E47-B713-F8D6D60F8A84@acm.org> (raw)
In-Reply-To: <6aa5c232-e3df-751a-a7dc-30d430924385@cs.ucla.edu>
[-- Attachment #1: Type: text/plain, Size: 291 bytes --]
13 feb. 2020 kl. 19.50 skrev Paul Eggert <eggert@cs.ucla.edu>:
> Simplest would be to remove it in Emacs 27, and merge that change into master.
Right, introducing it already deprecated doesn't make sense at all. I'll do it in emacs-27 if Eli doesn't mind too much (patch attached).
[-- Attachment #2: 0001-Remove-the-optional-KEEP-ORDER-argument-to-regexp-op.patch --]
[-- Type: application/octet-stream, Size: 8920 bytes --]
From 79ecef631e667650418b187143c4059d7e6eebe6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 13 Feb 2020 20:06:48 +0100
Subject: [PATCH] Remove the optional KEEP-ORDER argument to regexp-opt
This argument was added for the 'or' clause in rx, but it turned out
to be a bad idea (bug#37659), and there seems to be little other use
for it.
* lisp/emacs-lisp/regexp-opt.el (regexp-opt): Remove KEEP-ORDER.
* doc/lispref/searching.texi (Regexp Functions):
* etc/NEWS: Remove it from the documentation.
* test/lisp/emacs-lisp/regexp-opt-tests.el (regexp-opt-test--match-all)
(regexp-opt-test--check-perm, regexp-opt-test--explain-perm)
(regexp-opt-keep-order, regexp-opt-longest-match): Simplify test.
---
doc/lispref/searching.texi | 9 ++---
etc/NEWS | 7 ----
lisp/emacs-lisp/regexp-opt.el | 43 +++++------------------
test/lisp/emacs-lisp/regexp-opt-tests.el | 44 ++++--------------------
4 files changed, 19 insertions(+), 84 deletions(-)
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 5f4509a8b4..1f6db0643e 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1745,7 +1745,7 @@ Regexp Functions
@end defun
@cindex optimize regexp
-@defun regexp-opt strings &optional paren keep-order
+@defun regexp-opt strings &optional paren
This function returns an efficient regular expression that will match
any of the strings in the list @var{strings}. This is useful when you
need to make matching or searching as fast as possible---for example,
@@ -1783,11 +1783,8 @@ Regexp Functions
it will apply to the whole expression.
@end table
-The optional argument @var{keep-order}, if non-@code{nil}, forces the
-match to be performed in the order given, as if the strings were made
-into a regexp by joining them with the @samp{\|} operator. If nil or
-omitted, the returned regexp will always match the longest string
-possible.
+The returned regexp is ordered in such a way that it will always match
+the longest string possible.
Up to reordering, the resulting regexp of @code{regexp-opt} is
equivalent to but usually more efficient than that of a simplified
diff --git a/etc/NEWS b/etc/NEWS
index 312869fe57..380ac71260 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3526,13 +3526,6 @@ the process. That way 'make-process' can start remote processes.
This is currently supported on GNUish hosts and on modern versions of
MS-Windows.
-+++
-** The function 'regexp-opt' accepts an additional optional argument.
-By default, the regexp returned by 'regexp-opt' may match the strings
-in any order. If the new third argument is non-nil, the match is
-guaranteed to be performed in the order given, as if the strings were
-made into a regexp by joining them with '\|'.
-
+++
** The function 'regexp-opt', when given an empty list of strings, now
returns a regexp that never matches anything, which is an identity for
diff --git a/lisp/emacs-lisp/regexp-opt.el b/lisp/emacs-lisp/regexp-opt.el
index 2cce4e6353..35a5fda184 100644
--- a/lisp/emacs-lisp/regexp-opt.el
+++ b/lisp/emacs-lisp/regexp-opt.el
@@ -84,7 +84,7 @@
;;; Code:
;;;###autoload
-(defun regexp-opt (strings &optional paren keep-order)
+(defun regexp-opt (strings &optional paren)
"Return a regexp to match a string in the list STRINGS.
Each member of STRINGS is treated as a fixed string, not as a regexp.
Optional PAREN specifies how the returned regexp is surrounded by
@@ -114,11 +114,8 @@ regexp-opt
necessary to ensure that a postfix operator appended to it will
apply to the whole expression.
-The optional argument KEEP-ORDER, if non-nil, forces the match to
-be performed in the order given, as if the strings were made into
-a regexp by joining them with the `\\|' operator. If nil or
-omitted, the returned regexp is will always match the longest
-string possible.
+The returned regexp is ordered in such a way that it will always
+match the longest string possible.
Up to reordering, the resulting regexp is equivalent to but
usually more efficient than that of a simplified version:
@@ -140,34 +137,12 @@ regexp-opt
(completion-ignore-case nil)
(completion-regexp-list nil)
(open (cond ((stringp paren) paren) (paren "\\(")))
- (re
- (cond
- ;; No strings: return an unmatchable regexp.
- ((null strings)
- (concat (or open "\\(?:") regexp-unmatchable "\\)"))
-
- ;; The algorithm will generate a pattern that matches
- ;; longer strings in the list before shorter. If the
- ;; list order matters, then no string must come after a
- ;; proper prefix of that string. To check this, verify
- ;; that a straight or-pattern matches each string
- ;; entirely.
- ((and keep-order
- (let* ((case-fold-search nil)
- (alts (mapconcat #'regexp-quote strings "\\|")))
- (and (let ((s strings))
- (while (and s
- (string-match alts (car s))
- (= (match-end 0) (length (car s))))
- (setq s (cdr s)))
- ;; If we exited early, we found evidence that
- ;; regexp-opt-group cannot be used.
- s)
- (concat (or open "\\(?:") alts "\\)")))))
- (t
- (regexp-opt-group
- (delete-dups (sort (copy-sequence strings) 'string-lessp))
- (or open t) (not open))))))
+ (re (if strings
+ (regexp-opt-group
+ (delete-dups (sort (copy-sequence strings) 'string-lessp))
+ (or open t) (not open))
+ ;; No strings: return an unmatchable regexp.
+ (concat (or open "\\(?:") regexp-unmatchable "\\)"))))
(cond ((eq paren 'words)
(concat "\\<" re "\\>"))
((eq paren 'symbols)
diff --git a/test/lisp/emacs-lisp/regexp-opt-tests.el b/test/lisp/emacs-lisp/regexp-opt-tests.el
index 9b4567c72c..0179ac4f1f 100644
--- a/test/lisp/emacs-lisp/regexp-opt-tests.el
+++ b/test/lisp/emacs-lisp/regexp-opt-tests.el
@@ -47,43 +47,13 @@ regexp-opt-test--permutations
(mapcar (lambda (i) (regexp-opt-test--permutation i list))
(number-sequence 0 (1- (regexp-opt-test--factorial (length list))))))
-(defun regexp-opt-test--match-all (words re)
- (mapcar (lambda (w) (and (string-match re w)
- (match-string 0 w)))
- words))
-
-(defun regexp-opt-test--check-perm (perm)
- (let* ((ref-re (mapconcat #'regexp-quote perm "\\|"))
- (opt-re (regexp-opt perm nil t))
- (ref (regexp-opt-test--match-all perm ref-re))
- (opt (regexp-opt-test--match-all perm opt-re)))
- (equal opt ref)))
-
-(defun regexp-opt-test--explain-perm (perm)
- (let* ((ref-re (mapconcat #'regexp-quote perm "\\|"))
- (opt-re (regexp-opt perm nil t))
- (ref (regexp-opt-test--match-all perm ref-re))
- (opt (regexp-opt-test--match-all perm opt-re)))
- (concat "\n"
- (format "Naïve regexp: %s\n" ref-re)
- (format "Optimized regexp: %s\n" opt-re)
- (format "Got: %s\n" opt)
- (format "Expected: %s\n" ref))))
-
-(put 'regexp-opt-test--check-perm 'ert-explainer 'regexp-opt-test--explain-perm)
-
-(ert-deftest regexp-opt-keep-order ()
- "Check that KEEP-ORDER works."
- (dolist (perm (regexp-opt-test--permutations '("abc" "bca" "cab")))
- (should (regexp-opt-test--check-perm perm)))
- (dolist (perm (regexp-opt-test--permutations '("abc" "ab" "bca" "bc")))
- (should (regexp-opt-test--check-perm perm)))
- (dolist (perm (regexp-opt-test--permutations '("abxy" "cdxy")))
- (should (regexp-opt-test--check-perm perm)))
- (dolist (perm (regexp-opt-test--permutations '("afgx" "bfgx" "afgy" "bfgy")))
- (should (regexp-opt-test--check-perm perm)))
- (dolist (perm (regexp-opt-test--permutations '("a" "ab" "ac" "abc")))
- (should (regexp-opt-test--check-perm perm))))
+(ert-deftest regexp-opt-longest-match ()
+ "Check that the regexp always matches as much as possible."
+ (let ((s "abcd"))
+ (dolist (perm (regexp-opt-test--permutations '("a" "ab" "ac" "abc")))
+ (should (equal (and (string-match (regexp-opt perm) s)
+ (match-string 0 s))
+ "abc")))))
(ert-deftest regexp-opt-charset ()
(should (equal (regexp-opt-charset '(?a ?b ?a)) "[ab]"))
--
2.21.1 (Apple Git-122.3)
next prev parent reply other threads:[~2020-02-13 19:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-08 9:36 bug#37659: rx additions: anychar, unmatchable, unordered-or Mattias Engdegård
2019-10-09 8:59 ` Mattias Engdegård
2019-10-11 23:07 ` bug#37659: Mattias Engdegård <mattiase <at> acm.org> Paul Eggert
2019-10-12 10:47 ` Mattias Engdegård
2019-10-13 16:52 ` Paul Eggert
2019-10-13 19:48 ` Mattias Engdegård
2019-10-22 15:14 ` bug#37659: rx additions: anychar, unmatchable, unordered-or Mattias Engdegård
2019-10-22 15:27 ` Robert Pluim
2019-10-22 17:33 ` Paul Eggert
2019-10-23 9:15 ` Mattias Engdegård
2019-10-23 23:14 ` Paul Eggert
2019-10-24 1:56 ` Drew Adams
2019-10-24 9:09 ` Mattias Engdegård
2019-10-24 14:24 ` Drew Adams
2019-10-24 9:17 ` Phil Sainty
2019-10-24 14:32 ` Drew Adams
2019-10-24 8:58 ` Mattias Engdegård
2019-10-27 11:53 ` Mattias Engdegård
2020-02-11 12:57 ` Mattias Engdegård
2020-02-11 15:43 ` Eli Zaretskii
2020-02-11 19:17 ` Mattias Engdegård
2020-02-12 0:52 ` Paul Eggert
2020-02-12 11:22 ` Mattias Engdegård
2020-02-13 18:38 ` Mattias Engdegård
2020-02-13 18:50 ` Paul Eggert
2020-02-13 19:16 ` Mattias Engdegård [this message]
2020-02-13 19:30 ` Eli Zaretskii
2020-02-13 22:23 ` Mattias Engdegård
2020-02-14 7:45 ` Eli Zaretskii
2020-02-14 16:15 ` Paul Eggert
2020-02-14 20:49 ` Mattias Engdegård
2020-03-01 10:09 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4942C948-4FF4-4E47-B713-F8D6D60F8A84@acm.org \
--to=mattiase@acm.org \
--cc=37659@debbugs.gnu.org \
--cc=eggert@cs.ucla.edu \
--cc=psainty@orcon.net.nz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).