unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* A function to take the regexp-matched subsring directly
@ 2022-10-30 15:17 daanturo
  2022-10-30 15:45 ` Philip Kaludercic
  2022-10-30 15:52 ` Stefan Monnier
  0 siblings, 2 replies; 11+ messages in thread
From: daanturo @ 2022-10-30 15:17 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

How do you think about such a built-in functionality?

I find myself using them in parsing strings alot, most of the time I
just care about whether a particular matched (sub-)expression or not:
Does it match? Yes? Good, throw me result, else give me null.

The implementation is attached. I name it 'regexp-match' (please
change the name if needed).

-- 
Daanturo.

[-- Attachment #2: 0001-Define-regexp-match-regexp-match.patch --]
[-- Type: text/x-patch, Size: 4453 bytes --]

From dc788fca8def8c17f901ec5d72ff38d6716dc6ce Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 21:54:56 +0700
Subject: [PATCH] Define regexp-match, regexp-match*

* lisp/emacs-lisp/subr-x.el: implementation
* doc/lispref/searching.texi: documents
* etc/NEWS: documents
* lisp/emacs-lisp/shortdoc.el: documents
---
 doc/lispref/searching.texi  | 32 ++++++++++++++++++++++++++++++++
 etc/NEWS                    |  7 +++++++
 lisp/emacs-lisp/shortdoc.el |  4 ++++
 lisp/emacs-lisp/subr-x.el   | 27 +++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 743718b560..e4cecd858a 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -2099,6 +2099,38 @@ This predicate function does what @code{string-match} does, but it
 avoids modifying the match data.
 @end defun
 
+@defun regexp-match regexp string &optional n
+This function returns the n-th matched substring for regexp in string.
+N defaults to 0 (the whole match).  It does not modify the match data.
+
+@example
+@group
+(regexp-match "quick" "The quick brown fox jumped quickly.")
+        @result{} "quick"
+@end group
+@group
+(regexp-match "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly." 1)
+        @result{} "brown"
+@end group
+@end example
+
+@end defun
+
+
+@defun regexp-match* regexp string
+This function returns list of matched substrings for regexp
+in string.  It does not modify the match data.
+
+@example
+@group
+(regexp-match* "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly.")
+        @result{} ("quick brown" "brown")
+@end group
+@end example
+
+@end defun
+
+
 @defun looking-at regexp &optional inhibit-modify
 This function determines whether the text in the current buffer directly
 following point matches the regular expression @var{regexp}.  ``Directly
diff --git a/etc/NEWS b/etc/NEWS
index a185967483..6faee7251e 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3198,6 +3198,13 @@ The following generalized variables have been made obsolete:
 \f
 * Lisp Changes in Emacs 29.1
 
++++
+** New function 'regexp-match', 'regexp-match*'.
+'regexp-match' can be used to extract the substring that matches a
+wanted subexpression from a string, while 'regexp-match*' returns
+the corresponding substring for each subexpression. Both don't change
+the match data
+
 +++
 ** Interpreted closures are "safe for space".
 As was already the case for byte-compiled closures, instead of capturing
diff --git a/lisp/emacs-lisp/shortdoc.el b/lisp/emacs-lisp/shortdoc.el
index dbac03432c..81e6168217 100644
--- a/lisp/emacs-lisp/shortdoc.el
+++ b/lisp/emacs-lisp/shortdoc.el
@@ -781,6 +781,10 @@ A FUNC form can have any number of `:no-eval' (or `:no-value'),
    :eg-result 3)
   (save-match-data
     :no-eval (save-match-data ...))
+  (regexp-match
+   :eval (regexp-match "^\\([fo]+\\)b" "foobar" 1))
+  (regexp-match*
+   :eval (regexp-match* "^\\([fo]+\\)b" "foobar"))
   "Replacing Match"
   (replace-match
    :no-eval (replace-match "new")
diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 6e4d88b4df..ba57fe1cb7 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,6 +347,33 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+;;;###autoload
+(defun regexp-match (regexp string &optional n)
+  "Return the N -th matched substring for REGEXP in STRING.
+N defaults to 0 (the whole match).
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (let ((n (or n 0)))
+    (save-match-data
+      (when (string-match regexp string)
+        (match-string n string)))))
+
+;;;###autoload
+(defun regexp-match* (regexp string)
+  "Return a list of matched substrings for REGEXP in STRING.
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (save-match-data
+    (when (string-match regexp string)
+      (let ((match-index (1- (/ (length (match-data)) 2)))
+            matches)
+        (while (<= 0 match-index)
+          (push (match-string match-index string) matches)
+          (setq match-index (1- match-index)))
+        matches))))
+
 ;;;###autoload
 (defun add-display-text-property (start end prop value
                                         &optional object)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 15:17 A function to take the regexp-matched subsring directly daanturo
@ 2022-10-30 15:45 ` Philip Kaludercic
  2022-10-30 16:46   ` daanturo
  2022-10-30 15:52 ` Stefan Monnier
  1 sibling, 1 reply; 11+ messages in thread
From: Philip Kaludercic @ 2022-10-30 15:45 UTC (permalink / raw)
  To: daanturo; +Cc: emacs-devel

daanturo <daanturo@gmail.com> writes:

> How do you think about such a built-in functionality?
>
> I find myself using them in parsing strings alot, most of the time I
> just care about whether a particular matched (sub-)expression or not:
> Does it match? Yes? Good, throw me result, else give me null.

Is there a reason you find yourself working with strings as opposed to
buffers?  I've seen people try to force functional paradigms on Emacs
when they do stuff like creating a list of lines in a buffer then
iterating over these instead of using the (faster) buffer searching
mechanisms.  My worry is that functions like these, while useful per se,
might make more people inclined to write unideomatic and wasteful code.

> The implementation is attached. I name it 'regexp-match' (please
> change the name if needed).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 15:17 A function to take the regexp-matched subsring directly daanturo
  2022-10-30 15:45 ` Philip Kaludercic
@ 2022-10-30 15:52 ` Stefan Monnier
  2022-10-30 17:16   ` daanturo
  2022-10-30 17:29   ` Philip Kaludercic
  1 sibling, 2 replies; 11+ messages in thread
From: Stefan Monnier @ 2022-10-30 15:52 UTC (permalink / raw)
  To: daanturo; +Cc: emacs-devel

> +;;;###autoload
> +(defun regexp-match (regexp string &optional n)
> +  "Return the N -th matched substring for REGEXP in STRING.
> +N defaults to 0 (the whole match).
> +
> +This function does not change the match data."
> +  (declare (pure t) (side-effect-free t))
> +  (let ((n (or n 0)))
> +    (save-match-data
> +      (when (string-match regexp string)
> +        (match-string n string)))))

`save-match-data` is costly and extremely rarely needed.
So I'd much rather not save it here.

> +  (save-match-data
> +    (when (string-match regexp string)
> +      (let ((match-index (1- (/ (length (match-data)) 2)))
> +            matches)
> +        (while (<= 0 match-index)
> +          (push (match-string match-index string) matches)
> +          (setq match-index (1- match-index)))
> +        matches))))

I suspect it'd be more efficient to iterate directly on the `match-data` rather
than on an integer (which suffers from an O(N²) complexity).


        Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 15:45 ` Philip Kaludercic
@ 2022-10-30 16:46   ` daanturo
  2022-10-30 17:26     ` Philip Kaludercic
  0 siblings, 1 reply; 11+ messages in thread
From: daanturo @ 2022-10-30 16:46 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: emacs-devel


On 30/10/2022 22:45, Philip Kaludercic wrote:
> Is there a reason you find yourself working with strings as opposed to
> buffers?  I've seen people try to force functional paradigms on Emacs
> when they do stuff like creating a list of lines in a buffer then
> iterating over these instead of using the (faster) buffer searching
> mechanisms.  My worry is that functions like these, while useful per se,
> might make more people inclined to write unideomatic and wasteful code.


In my case, strings are usually file names and shell command outputs


```elisp

;; Get the commit hash returned by git blame

(shell-command-to-string "git blame -L 1,1 -- README")

"19dcb237b5b (Eli Zaretskii 2022-01-01 02:45:51 -0500 1) Copyright (C) 2001-2022 Free Software Foundation, Inc.
"

;; parse:
(regexp-match "^[^ ]+"
"19dcb237b5b (Eli Zaretskii 2022-01-01 02:45:51 -0500 1) Copyright (C) 2001-2022 Free Software Foundation, Inc.
")
=> "19dcb237b5b"


;; From `vc-revision-other-window''s file name, find the original name and the revision
(regexp-match*
    "\\(.*?\\)\\(?:\\.~\\(.*?\\)~\\)?\\'"
    "/foo/bar.el.~main~")
    
=> ("/foo/bar.el.~main~" "/foo/bar.el" "main")

```


And many possible cases where the strings maybe buffer names or any not so long
strings that are not the size of a buffer.

-- 
Daanturo.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 15:52 ` Stefan Monnier
@ 2022-10-30 17:16   ` daanturo
  2022-10-30 22:01     ` Stefan Monnier
  2022-10-30 17:29   ` Philip Kaludercic
  1 sibling, 1 reply; 11+ messages in thread
From: daanturo @ 2022-10-30 17:16 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

> I suspect it'd be more efficient to iterate directly on the `match-data` rather
> than on an integer (which suffers from an O(N²) complexity).

Got it. By using `substring` directly, it looks like this:

```emacs-lisp
(when (string-match regexp string)
    (let ((matched-data (match-data))
          matches beg end)
      (while matched-data
        (setq beg (pop matched-data))
        (setq end (pop matched-data))
        (push (and beg end
                   (substring string beg end))
              matches))
      (nreverse matches)))
```

> `save-match-data` is costly and extremely rarely needed.

I committed a change that now makes inhibit-modify optional (though `(declare
(pure t) (side-effect-free t))` is lost in the process).


Although I think intuitively, when running those kind of functions, we naturally
expect them not to cause any side-effects from a high-level perspective so
`save-match-data` should be the default.

-- 
Daanturo.

[-- Attachment #2: regexp-match.patch --]
[-- Type: text/x-patch, Size: 7381 bytes --]

From 7540dc132f15aa27b6df3c6a0239a8f70ef19032 Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 21:54:56 +0700
Subject: [PATCH 1/2] Define regexp-match, regexp-match*

* lisp/emacs-lisp/subr-x.el: implementation
* doc/lispref/searching.texi: documents
* etc/NEWS: documents
* lisp/emacs-lisp/shortdoc.el: documents
---
 doc/lispref/searching.texi  | 32 ++++++++++++++++++++++++++++++++
 etc/NEWS                    |  7 +++++++
 lisp/emacs-lisp/shortdoc.el |  4 ++++
 lisp/emacs-lisp/subr-x.el   | 30 ++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 743718b560..e4cecd858a 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -2099,6 +2099,38 @@ This predicate function does what @code{string-match} does, but it
 avoids modifying the match data.
 @end defun
 
+@defun regexp-match regexp string &optional n
+This function returns the n-th matched substring for regexp in string.
+N defaults to 0 (the whole match).  It does not modify the match data.
+
+@example
+@group
+(regexp-match "quick" "The quick brown fox jumped quickly.")
+        @result{} "quick"
+@end group
+@group
+(regexp-match "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly." 1)
+        @result{} "brown"
+@end group
+@end example
+
+@end defun
+
+
+@defun regexp-match* regexp string
+This function returns list of matched substrings for regexp
+in string.  It does not modify the match data.
+
+@example
+@group
+(regexp-match* "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly.")
+        @result{} ("quick brown" "brown")
+@end group
+@end example
+
+@end defun
+
+
 @defun looking-at regexp &optional inhibit-modify
 This function determines whether the text in the current buffer directly
 following point matches the regular expression @var{regexp}.  ``Directly
diff --git a/etc/NEWS b/etc/NEWS
index a185967483..6faee7251e 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3198,6 +3198,13 @@ The following generalized variables have been made obsolete:
 \f
 * Lisp Changes in Emacs 29.1
 
++++
+** New function 'regexp-match', 'regexp-match*'.
+'regexp-match' can be used to extract the substring that matches a
+wanted subexpression from a string, while 'regexp-match*' returns
+the corresponding substring for each subexpression. Both don't change
+the match data
+
 +++
 ** Interpreted closures are "safe for space".
 As was already the case for byte-compiled closures, instead of capturing
diff --git a/lisp/emacs-lisp/shortdoc.el b/lisp/emacs-lisp/shortdoc.el
index dbac03432c..81e6168217 100644
--- a/lisp/emacs-lisp/shortdoc.el
+++ b/lisp/emacs-lisp/shortdoc.el
@@ -781,6 +781,10 @@ A FUNC form can have any number of `:no-eval' (or `:no-value'),
    :eg-result 3)
   (save-match-data
     :no-eval (save-match-data ...))
+  (regexp-match
+   :eval (regexp-match "^\\([fo]+\\)b" "foobar" 1))
+  (regexp-match*
+   :eval (regexp-match* "^\\([fo]+\\)b" "foobar"))
   "Replacing Match"
   (replace-match
    :no-eval (replace-match "new")
diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 6e4d88b4df..0badf1cbaf 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,6 +347,36 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+;;;###autoload
+(defun regexp-match (regexp string &optional n)
+  "Return the N -th matched substring for REGEXP in STRING.
+N defaults to 0 (the whole match).
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (let ((n (or n 0)))
+    (save-match-data
+      (when (string-match regexp string)
+        (match-string n string)))))
+
+;;;###autoload
+(defun regexp-match* (regexp string)
+  "Return a list of matched substrings for REGEXP in STRING.
+
+This function does not change the match data."
+  (declare (pure t) (side-effect-free t))
+  (save-match-data
+    (when (string-match regexp string)
+      (let ((matched-data (match-data))
+            matches beg end)
+        (while matched-data
+          (setq beg (pop matched-data))
+          (setq end (pop matched-data))
+          (push (and beg end
+                     (substring string beg end))
+                matches))
+        (nreverse matches)))))
+
 ;;;###autoload
 (defun add-display-text-property (start end prop value
                                         &optional object)
-- 
2.38.1


From 47c1414cf24b29ab85f9b43e2a7deaa134271cd2 Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 23:56:37 +0700
Subject: [PATCH 2/2] regexp-match, regexp-match*: make inhibit-modify optional

* lisp/emacs-lisp/subr-x.el: split helper functions for the above and
add optional INHIBIT-MODIFY.
---
 lisp/emacs-lisp/subr-x.el | 48 +++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 0badf1cbaf..4de2213ade 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,35 +347,39 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+(defun regexp--match (regexp string &optional n)
+  (let ((n (or n 0)))
+    (when (string-match regexp string)
+      (match-string n string))))
 ;;;###autoload
-(defun regexp-match (regexp string &optional n)
+(defun regexp-match (regexp string &optional n inhibit-modify)
   "Return the N -th matched substring for REGEXP in STRING.
 N defaults to 0 (the whole match).
-
-This function does not change the match data."
+With non-nil INHIBIT-MODIFY, does not change the match data."
   (declare (pure t) (side-effect-free t))
-  (let ((n (or n 0)))
-    (save-match-data
-      (when (string-match regexp string)
-        (match-string n string)))))
-
+  (if inhibit-modify
+      (save-match-data (regexp--match regexp string n))
+    (regexp--match regexp string n)))
+
+(defun regexp--match* (regexp string)
+  (when (string-match regexp string)
+    (let ((matched-data (match-data))
+          matches beg end)
+      (while matched-data
+        (setq beg (pop matched-data))
+        (setq end (pop matched-data))
+        (push (and beg end
+                   (substring string beg end))
+              matches))
+      (nreverse matches))))
 ;;;###autoload
-(defun regexp-match* (regexp string)
+(defun regexp-match* (regexp string &optional inhibit-modify)
   "Return a list of matched substrings for REGEXP in STRING.
-
-This function does not change the match data."
+With non-nil INHIBIT-MODIFY, does not change the match data. "
   (declare (pure t) (side-effect-free t))
-  (save-match-data
-    (when (string-match regexp string)
-      (let ((matched-data (match-data))
-            matches beg end)
-        (while matched-data
-          (setq beg (pop matched-data))
-          (setq end (pop matched-data))
-          (push (and beg end
-                     (substring string beg end))
-                matches))
-        (nreverse matches)))))
+  (if inhibit-modify
+      (save-match-data (regexp--match* regexp string))
+    (regexp--match* regexp string)))
 
 ;;;###autoload
 (defun add-display-text-property (start end prop value
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 16:46   ` daanturo
@ 2022-10-30 17:26     ` Philip Kaludercic
  0 siblings, 0 replies; 11+ messages in thread
From: Philip Kaludercic @ 2022-10-30 17:26 UTC (permalink / raw)
  To: daanturo; +Cc: emacs-devel

daanturo <daanturo@gmail.com> writes:

> On 30/10/2022 22:45, Philip Kaludercic wrote:
>> Is there a reason you find yourself working with strings as opposed to
>> buffers?  I've seen people try to force functional paradigms on Emacs
>> when they do stuff like creating a list of lines in a buffer then
>> iterating over these instead of using the (faster) buffer searching
>> mechanisms.  My worry is that functions like these, while useful per se,
>> might make more people inclined to write unideomatic and wasteful code.
>
>
> In my case, strings are usually file names and shell command outputs
>
> ```elisp
>
> ;; Get the commit hash returned by git blame
>
> (shell-command-to-string "git blame -L 1,1 -- README")
>
> "19dcb237b5b (Eli Zaretskii 2022-01-01 02:45:51 -0500 1) Copyright (C) 2001-2022 Free Software Foundation, Inc.
> "
>
> ;; parse:
> (regexp-match "^[^ ]+"
> "19dcb237b5b (Eli Zaretskii 2022-01-01 02:45:51 -0500 1) Copyright (C) 2001-2022 Free Software Foundation, Inc.
> ")
> => "19dcb237b5b"

I would argue that this is more robust (though more verbose)

--8<---------------cut here---------------start------------->8---
(with-temp-buffer
  (call-process "git" nil t nil "blame" "-L1,1" "--" "README")
  (goto-char (point-min))
  (if (looking-at "\\`[[:alnum:]]+")
      (match-string 0)
    'some-other-value))
--8<---------------cut here---------------end--------------->8---

>
> ;; From `vc-revision-other-window''s file name, find the original name and the revision
> (regexp-match*
>     "\\(.*?\\)\\(?:\\.~\\(.*?\\)~\\)?\\'"
>     "/foo/bar.el.~main~")
>     
> => ("/foo/bar.el.~main~" "/foo/bar.el" "main")

And the file-name-* functions ought to be used here instead.

> ```
>
>
> And many possible cases where the strings maybe buffer names or any not so long
> strings that are not the size of a buffer.

What do you mean by "size of a buffer"?  Come to think of it, a macro
like `with-string-as-buffer' would be a good addition, to make it easier
to use the text editing functionality instead of string handling.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 15:52 ` Stefan Monnier
  2022-10-30 17:16   ` daanturo
@ 2022-10-30 17:29   ` Philip Kaludercic
  2022-10-30 22:07     ` Stefan Monnier
  1 sibling, 1 reply; 11+ messages in thread
From: Philip Kaludercic @ 2022-10-30 17:29 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: daanturo, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> +;;;###autoload
>> +(defun regexp-match (regexp string &optional n)
>> +  "Return the N -th matched substring for REGEXP in STRING.
>> +N defaults to 0 (the whole match).
>> +
>> +This function does not change the match data."
>> +  (declare (pure t) (side-effect-free t))
>> +  (let ((n (or n 0)))
>> +    (save-match-data
>> +      (when (string-match regexp string)
>> +        (match-string n string)))))
>
> `save-match-data` is costly and extremely rarely needed.

What makes it so expensive?  The implementation appears to be trivial.

> So I'd much rather not save it here.

If the function is supposed to be side-effect-free, then it ought not to
sometimes replace the match data and not touch it when optimised away.

>> +  (save-match-data
>> +    (when (string-match regexp string)
>> +      (let ((match-index (1- (/ (length (match-data)) 2)))
>> +            matches)
>> +        (while (<= 0 match-index)
>> +          (push (match-string match-index string) matches)
>> +          (setq match-index (1- match-index)))
>> +        matches))))
>
> I suspect it'd be more efficient to iterate directly on the `match-data` rather
> than on an integer (which suffers from an O(N²) complexity).
>
>
>         Stefan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 17:16   ` daanturo
@ 2022-10-30 22:01     ` Stefan Monnier
  2022-10-31  3:47       ` daanturo
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2022-10-30 22:01 UTC (permalink / raw)
  To: daanturo; +Cc: emacs-devel

>> `save-match-data` is costly and extremely rarely needed.
>
> I committed a change that now makes inhibit-modify optional (though `(declare
> (pure t) (side-effect-free t))` is lost in the process).

Optional makes no sense: those who need the match data to be saved can
use `save-match-data` around the call just as easily as passing an
optional argument.

> Although I think intuitively, when running those kind of functions, we
> naturally expect them not to cause any side-effects from a high-level
> perspective so `save-match-data` should be the default.

That's not how it works: your intuition should say "oh, it uses
a regexp, so it most assuredly messes with the match data".  Only very
few primitive operations like `car/cdr` preserve the match data.
Everything else should be presumed to mess with the match data.
`save-match-data` should almost never be used at the top-level of
a function.
It should only be used in cases such as:

      ...
      (string-match ..)
      ...
      (save-match-data
        ...do something that may mess with the match data...)
      ...
      (match-string ..)

I tried to explain that in the docstring as follow:

    NOTE: The convention in Elisp is that any function, except for a few
    exceptions like car/assoc/+/goto-char, can clobber the match data,
    so `save-match-data' should normally be used to save *your* match data
    rather than your caller's match data."


-- Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 17:29   ` Philip Kaludercic
@ 2022-10-30 22:07     ` Stefan Monnier
  2022-10-31  8:56       ` Mattias Engdegård
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2022-10-30 22:07 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: daanturo, emacs-devel

>> `save-match-data` is costly and extremely rarely needed.
> What makes it so expensive?  The implementation appears to be trivial.

It allocates cons cells (by calling `match-data`) to store the values.
If you put it in such a place it will be wasted 99% of the time.

>> So I'd much rather not save it here.
> If the function is supposed to be side-effect-free, then it ought not to
> sometimes replace the match data and not touch it when optimised away.

The match-data is defined to be something very ephemeral, so it's OK for
pure functions to clobber it.

Admittedly, here it doesn't just clobber it but sets it to a reliable
value, so there's a high risk that someone will rely on that match-data,
so better not mark it as pure, indeed, contrary to what I said earlier.


        Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 22:01     ` Stefan Monnier
@ 2022-10-31  3:47       ` daanturo
  0 siblings, 0 replies; 11+ messages in thread
From: daanturo @ 2022-10-31  3:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On 31/10/2022 05:01, Stefan Monnier wrote:
>     NOTE: The convention in Elisp is that any function, except for a few
>     exceptions like car/assoc/+/goto-char, can clobber the match data,
>     so `save-match-data' should normally be used to save *your* match data
>     rather than your caller's match data."
Thank you for clarifying, my updated version:

-- 
Daanturo.

[-- Attachment #2: 0001-Define-regexp-match-regexp-match.patch --]
[-- Type: text/x-patch, Size: 4282 bytes --]

From 437248de89928732ab9af85d923c7ae815214d96 Mon Sep 17 00:00:00 2001
From: Daanturo <daanturo@gmail.com>
Date: Sun, 30 Oct 2022 21:54:56 +0700
Subject: [PATCH] Define regexp-match, regexp-match*

* lisp/emacs-lisp/subr-x.el: implementation
* doc/lispref/searching.texi: documents
* etc/NEWS: documents
* lisp/emacs-lisp/shortdoc.el: documents
---
 doc/lispref/searching.texi  | 32 ++++++++++++++++++++++++++++++++
 etc/NEWS                    |  7 +++++++
 lisp/emacs-lisp/shortdoc.el |  4 ++++
 lisp/emacs-lisp/subr-x.el   | 26 ++++++++++++++++++++++++++
 4 files changed, 69 insertions(+)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 743718b560..a5c0b426d0 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -2099,6 +2099,38 @@ This predicate function does what @code{string-match} does, but it
 avoids modifying the match data.
 @end defun
 
+@defun regexp-match regexp string &optional n
+This function returns the n-th matched substring for regexp in string.
+N defaults to 0 (the whole match).
+
+@example
+@group
+(regexp-match "quick" "The quick brown fox jumped quickly.")
+        @result{} "quick"
+@end group
+@group
+(regexp-match "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly." 1)
+        @result{} "brown"
+@end group
+@end example
+
+@end defun
+
+
+@defun regexp-match* regexp string
+This function returns list of matched substrings for regexp
+in string.
+
+@example
+@group
+(regexp-match* "quick[[:space:]]+\\([a-z]+\\)" "The quick brown fox jumped quickly.")
+        @result{} ("quick brown" "brown")
+@end group
+@end example
+
+@end defun
+
+
 @defun looking-at regexp &optional inhibit-modify
 This function determines whether the text in the current buffer directly
 following point matches the regular expression @var{regexp}.  ``Directly
diff --git a/etc/NEWS b/etc/NEWS
index a185967483..a15e85521b 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -3198,6 +3198,13 @@ The following generalized variables have been made obsolete:
 \f
 * Lisp Changes in Emacs 29.1
 
++++
+** New function 'regexp-match', 'regexp-match*'.
+'regexp-match' can be used to extract the substring that matches a
+wanted subexpression from a string, while 'regexp-match*' returns the
+corresponding substring for each subexpression. Both modify the match
+data.
+
 +++
 ** Interpreted closures are "safe for space".
 As was already the case for byte-compiled closures, instead of capturing
diff --git a/lisp/emacs-lisp/shortdoc.el b/lisp/emacs-lisp/shortdoc.el
index dbac03432c..81e6168217 100644
--- a/lisp/emacs-lisp/shortdoc.el
+++ b/lisp/emacs-lisp/shortdoc.el
@@ -781,6 +781,10 @@ A FUNC form can have any number of `:no-eval' (or `:no-value'),
    :eg-result 3)
   (save-match-data
     :no-eval (save-match-data ...))
+  (regexp-match
+   :eval (regexp-match "^\\([fo]+\\)b" "foobar" 1))
+  (regexp-match*
+   :eval (regexp-match* "^\\([fo]+\\)b" "foobar"))
   "Replacing Match"
   (replace-match
    :no-eval (replace-match "new")
diff --git a/lisp/emacs-lisp/subr-x.el b/lisp/emacs-lisp/subr-x.el
index 6e4d88b4df..2d1b40a2f0 100644
--- a/lisp/emacs-lisp/subr-x.el
+++ b/lisp/emacs-lisp/subr-x.el
@@ -347,6 +347,32 @@ This takes into account combining characters and grapheme clusters."
         (setq start (1+ start))))
     (nreverse result)))
 
+;;;###autoload
+(defun regexp-match (regexp string &optional n)
+  "Return the N -th matched substring for REGEXP in STRING.
+N defaults to 0 (the whole match).
+
+This function modifies the match data."
+  (let ((n (or n 0)))
+    (when (string-match regexp string)
+      (match-string n string))))
+
+;;;###autoload
+(defun regexp-match* (regexp string)
+  "Return a list of matched substrings for REGEXP in STRING.
+
+This function modifies the match data."
+  (when (string-match regexp string)
+    (let ((matched-data (match-data))
+          matches beg end)
+      (while matched-data
+        (setq beg (pop matched-data))
+        (setq end (pop matched-data))
+        (push (and beg end
+                   (substring string beg end))
+              matches))
+      (nreverse matches))))
+
 ;;;###autoload
 (defun add-display-text-property (start end prop value
                                         &optional object)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: A function to take the regexp-matched subsring directly
  2022-10-30 22:07     ` Stefan Monnier
@ 2022-10-31  8:56       ` Mattias Engdegård
  0 siblings, 0 replies; 11+ messages in thread
From: Mattias Engdegård @ 2022-10-31  8:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Philip Kaludercic, daanturo, emacs-devel

30 okt. 2022 kl. 23.07 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> The match-data is defined to be something very ephemeral, so it's OK for
> pure functions to clobber it.
> 
> Admittedly, here it doesn't just clobber it but sets it to a reliable
> value, so there's a high risk that someone will rely on that match-data,
> so better not mark it as pure, indeed, contrary to what I said earlier.

Wouldn't be pure anyway since string-match depends on case-fold-search.




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-31  8:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-30 15:17 A function to take the regexp-matched subsring directly daanturo
2022-10-30 15:45 ` Philip Kaludercic
2022-10-30 16:46   ` daanturo
2022-10-30 17:26     ` Philip Kaludercic
2022-10-30 15:52 ` Stefan Monnier
2022-10-30 17:16   ` daanturo
2022-10-30 22:01     ` Stefan Monnier
2022-10-31  3:47       ` daanturo
2022-10-30 17:29   ` Philip Kaludercic
2022-10-30 22:07     ` Stefan Monnier
2022-10-31  8:56       ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).