unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: daanturo <daanturo@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: A function to take the regexp-matched subsring directly
Date: Mon, 31 Oct 2022 00:16:03 +0700	[thread overview]
Message-ID: <47fff48c-90d4-7c6b-7b92-8a99d9453f3f@gmail.com> (raw)
In-Reply-To: <jwvr0yp2tz5.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

> I suspect it'd be more efficient to iterate directly on the `match-data` rather
> than on an integer (which suffers from an O(N²) complexity).

Got it. By using `substring` directly, it looks like this:

```emacs-lisp
(when (string-match regexp string)
    (let ((matched-data (match-data))
          matches beg end)
      (while matched-data
        (setq beg (pop matched-data))
        (setq end (pop matched-data))
        (push (and beg end
                   (substring string beg end))
              matches))
      (nreverse matches)))
```

> `save-match-data` is costly and extremely rarely needed.

I committed a change that now makes inhibit-modify optional (though `(declare
(pure t) (side-effect-free t))` is lost in the process).


Although I think intuitively, when running those kind of functions, we naturally
expect them not to cause any side-effects from a high-level perspective so
`save-match-data` should be the default.

-- 
Daanturo.

[-- Attachment #2: regexp-match.patch --]
[-- Type: text/x-patch, Size: 7381 bytes --]

From 7540dc132f15aa27b6df3c6a0239a8f70ef19032 Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 21:54:56 +0700
Subject: [PATCH 1/2] Define regexp-match, regexp-match*

* lisp/emacs-lisp/subr-x.el: implementation
* doc/lispref/searching.texi: documents
* etc/NEWS: documents
* lisp/emacs-lisp/shortdoc.el: documents
---
 doc/lispref/searching.texi  | 32 ++++++++++++++++++++++++++++++++
 etc/NEWS                    |  7 +++++++
 lisp/emacs-lisp/shortdoc.el |  4 ++++
 lisp/emacs-lisp/subr-x.el   | 30 ++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 743718b560..e4cecd858a 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -2099,6 +2099,38 @@ This predicate function does what @code{string-match} does, but it
 avoids modifying the match data.
 @end defun
 
+@defun regexp-match regexp string &optional n
+This function returns the n-th matched substring for regexp in string.
+N defaults to 0 (the whole match).  It does not modify the match data.
+
+@example
+@group
+(regexp-match "quick" "The quick brown fox jumped quickly.")
+        @result{} "quick"
+@end group
+@group
+(regexp-match "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly." 1)
+        @result{} "brown"
+@end group
+@end example
+
+@end defun
+
+
+@defun regexp-match* regexp string
+This function returns list of matched substrings for regexp
+in string.  It does not modify the match data.
+
+@example
+@group
+(regexp-match* "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly.")
+        @result{} ("quick brown" "brown")
+@end group
+@end example
+
+@end defun
+
+
 @defun looking-at regexp &optional inhibit-modify
 This function determines whether the text in the current buffer directly
 following point matches the regular expression @var{regexp}.  ``Directly
diff --git a/etc/NEWS b/etc/NEWS
index a185967483..6faee7251e 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3198,6 +3198,13 @@ The following generalized variables have been made obsolete:
 \f
 * Lisp Changes in Emacs 29.1
 
++++
+** New function 'regexp-match', 'regexp-match*'.
+'regexp-match' can be used to extract the substring that matches a
+wanted subexpression from a string, while 'regexp-match*' returns
+the corresponding substring for each subexpression. Both don't change
+the match data
+
 +++
 ** Interpreted closures are "safe for space".
 As was already the case for byte-compiled closures, instead of capturing
diff --git a/lisp/emacs-lisp/shortdoc.el b/lisp/emacs-lisp/shortdoc.el
index dbac03432c..81e6168217 100644
--- a/lisp/emacs-lisp/shortdoc.el
+++ b/lisp/emacs-lisp/shortdoc.el
@@ -781,6 +781,10 @@ A FUNC form can have any number of `:no-eval' (or `:no-value'),
    :eg-result 3)
   (save-match-data
     :no-eval (save-match-data ...))
+  (regexp-match
+   :eval (regexp-match "^\\([fo]+\\)b" "foobar" 1))
+  (regexp-match*
+   :eval (regexp-match* "^\\([fo]+\\)b" "foobar"))
   "Replacing Match"
   (replace-match
    :no-eval (replace-match "new")
diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 6e4d88b4df..0badf1cbaf 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,6 +347,36 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+;;;###autoload
+(defun regexp-match (regexp string &optional n)
+  "Return the N -th matched substring for REGEXP in STRING.
+N defaults to 0 (the whole match).
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (let ((n (or n 0)))
+    (save-match-data
+      (when (string-match regexp string)
+        (match-string n string)))))
+
+;;;###autoload
+(defun regexp-match* (regexp string)
+  "Return a list of matched substrings for REGEXP in STRING.
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (save-match-data
+    (when (string-match regexp string)
+      (let ((matched-data (match-data))
+            matches beg end)
+        (while matched-data
+          (setq beg (pop matched-data))
+          (setq end (pop matched-data))
+          (push (and beg end
+                     (substring string beg end))
+                matches))
+        (nreverse matches)))))
+
 ;;;###autoload
 (defun add-display-text-property (start end prop value
                                         &optional object)
-- 
2.38.1


From 47c1414cf24b29ab85f9b43e2a7deaa134271cd2 Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 23:56:37 +0700
Subject: [PATCH 2/2] regexp-match, regexp-match*: make inhibit-modify optional

* lisp/emacs-lisp/subr-x.el: split helper functions for the above and
add optional INHIBIT-MODIFY.
---
 lisp/emacs-lisp/subr-x.el | 48 +++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 0badf1cbaf..4de2213ade 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,35 +347,39 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+(defun regexp--match (regexp string &optional n)
+  (let ((n (or n 0)))
+    (when (string-match regexp string)
+      (match-string n string))))
 ;;;###autoload
-(defun regexp-match (regexp string &optional n)
+(defun regexp-match (regexp string &optional n inhibit-modify)
   "Return the N -th matched substring for REGEXP in STRING.
 N defaults to 0 (the whole match).
-
-This function does not change the match data."
+With non-nil INHIBIT-MODIFY, does not change the match data."
   (declare (pure t) (side-effect-free t))
-  (let ((n (or n 0)))
-    (save-match-data
-      (when (string-match regexp string)
-        (match-string n string)))))
-
+  (if inhibit-modify
+      (save-match-data (regexp--match regexp string n))
+    (regexp--match regexp string n)))
+
+(defun regexp--match* (regexp string)
+  (when (string-match regexp string)
+    (let ((matched-data (match-data))
+          matches beg end)
+      (while matched-data
+        (setq beg (pop matched-data))
+        (setq end (pop matched-data))
+        (push (and beg end
+                   (substring string beg end))
+              matches))
+      (nreverse matches))))
 ;;;###autoload
-(defun regexp-match* (regexp string)
+(defun regexp-match* (regexp string &optional inhibit-modify)
   "Return a list of matched substrings for REGEXP in STRING.
-
-This function does not change the match data."
+With non-nil INHIBIT-MODIFY, does not change the match data. "
   (declare (pure t) (side-effect-free t))
-  (save-match-data
-    (when (string-match regexp string)
-      (let ((matched-data (match-data))
-            matches beg end)
-        (while matched-data
-          (setq beg (pop matched-data))
-          (setq end (pop matched-data))
-          (push (and beg end
-                     (substring string beg end))
-                matches))
-        (nreverse matches)))))
+  (if inhibit-modify
+      (save-match-data (regexp--match* regexp string))
+    (regexp--match* regexp string)))
 
 ;;;###autoload
 (defun add-display-text-property (start end prop value
-- 
2.38.1


  reply	other threads:[~2022-10-30 17:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-30 15:17 A function to take the regexp-matched subsring directly daanturo
2022-10-30 15:45 ` Philip Kaludercic
2022-10-30 16:46   ` daanturo
2022-10-30 17:26     ` Philip Kaludercic
2022-10-30 15:52 ` Stefan Monnier
2022-10-30 17:16   ` daanturo [this message]
2022-10-30 22:01     ` Stefan Monnier
2022-10-31  3:47       ` daanturo
2022-10-30 17:29   ` Philip Kaludercic
2022-10-30 22:07     ` Stefan Monnier
2022-10-31  8:56       ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47fff48c-90d4-7c6b-7b92-8a99d9453f3f@gmail.com \
    --to=daanturo@gmail.com \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).