From: "Mattias Engdegård" <mattiase@acm.org>
To: Shigeru Fukaya <shigeru.fukaya@gmail.com>
Cc: 44861@debbugs.gnu.org
Subject: bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string'
Date: Wed, 25 Nov 2020 15:58:22 +0100 [thread overview]
Message-ID: <97535AF5-D542-4267-A5A9-1483C32A61AC@acm.org> (raw)
In-Reply-To: <6F768DED-2E1B-4D06-A776-FFA162AC32AD@acm.org>
[-- Attachment #1: Type: text/plain, Size: 640 bytes --]
forcemerge 15107 44861
stop
Suggested patch attached. A small test suite for replace-regexp-in-string has already been pushed to master -- very rudimentary, but better than nothing -- and the patch amends it with some new relevant cases that didn't work before.
It is basically your patch but slightly optimised; it turned out that the function call and allocation overhead of the original patch made it a tad too expensive (a pity, because it was very neat). Now performance is about the same as before when the pattern contains no submatches, and slightly above (< 10% slower) with one submatch. It seems worth the correctness.
[-- Attachment #2: 0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch --]
[-- Type: application/octet-stream, Size: 2929 bytes --]
From 9bc8dc80be5cee517fa53e6b8f37881d4220f162 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 25 Nov 2020 15:32:08 +0100
Subject: [PATCH] Fix replace-regexp-in-string substring match data translation
For certain patterns, re-matching the same regexp on the matched
substring does not produce correctly translated match data
(bug#15107 and bug#44861).
Reported by Kevin Ryde and Shigeru Fukaya.
* lisp/subr.el (replace-regexp-in-string): Translate the match data
by explicit manipulation instead of trusting a call to string-match on
the matched string to do the job.
* test/lisp/subr-tests.el (subr-replace-regexp-in-string):
Add test cases.
---
lisp/subr.el | 17 ++++++++++++-----
test/lisp/subr-tests.el | 6 +++++-
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/lisp/subr.el b/lisp/subr.el
index 1fb0f9ab7e..0ee2199933 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -4537,7 +4537,7 @@ replace-regexp-in-string
;; might be reasonable to do so for long enough STRING.]
(let ((l (length string))
(start (or start 0))
- matches str mb me)
+ matches str mb me md)
(save-match-data
(while (and (< start l) (string-match regexp string start))
(setq mb (match-beginning 0)
@@ -4546,10 +4546,17 @@ replace-regexp-in-string
(when (= me mb) (setq me (min l (1+ mb))))
;; Generate a replacement for the matched substring.
;; Operate on only the substring to minimize string consing.
- ;; Set up match data for the substring for replacement;
- ;; presumably this is likely to be faster than munging the
- ;; match data directly in Lisp.
- (string-match regexp (setq str (substring string mb me)))
+
+ ;; Translate the match data so that it applies to the matched substring.
+ (setq md (match-data nil md t)) ; Reuse list from previous match.
+ (let ((m md))
+ (while m
+ (when (car m)
+ (setcar m (- (car m) mb)))
+ (setq m (cdr m)))
+ (set-match-data md))
+
+ (setq str (substring string mb me))
(setq matches
(cons (replace-match (if (stringp rep)
rep
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el
index c77be511dc..67f7fc9749 100644
--- a/test/lisp/subr-tests.el
+++ b/test/lisp/subr-tests.el
@@ -545,7 +545,11 @@ subr-replace-regexp-in-string
(match-beginning 1) (match-end 1)))
"babbcaacabc")
"b<abbc,0,4,1,3>a<ac,0,2,1,1><abc,0,3,1,2>"))
- )
+ ;; anchors (bug#15107, bug#44861)
+ (should (equal (replace-regexp-in-string "a\\B" "b" "a aaaa")
+ "a bbba"))
+ (should (equal (replace-regexp-in-string "\\`\\|x" "z" "--xx--")
+ "z--zz--")))
(provide 'subr-tests)
;;; subr-tests.el ends here
--
2.21.1 (Apple Git-122.3)
next prev parent reply other threads:[~2020-11-25 14:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-25 4:02 bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string' Shigeru Fukaya
2020-11-25 10:58 ` Mattias Engdegård
2020-11-25 14:58 ` Mattias Engdegård [this message]
2020-11-25 21:39 ` Stefan Kangas
2020-11-26 12:57 ` Mattias Engdegård
2020-11-26 13:12 ` Lars Ingebrigtsen
2020-11-26 13:39 ` Mattias Engdegård
2020-11-26 14:03 ` Lars Ingebrigtsen
2020-11-26 14:54 ` Mattias Engdegård
2020-11-29 13:28 ` Basil L. Contovounesios
2020-11-26 13:43 ` Stefan Kangas
2020-11-26 14:03 ` Lars Ingebrigtsen
2020-11-26 14:41 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97535AF5-D542-4267-A5A9-1483C32A61AC@acm.org \
--to=mattiase@acm.org \
--cc=44861@debbugs.gnu.org \
--cc=shigeru.fukaya@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).