unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Stefan Kangas <stefankangas@gmail.com>
Cc: 44861@debbugs.gnu.org, Shigeru Fukaya <shigeru.fukaya@gmail.com>
Subject: bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string'
Date: Thu, 26 Nov 2020 13:57:59 +0100	[thread overview]
Message-ID: <83EC926B-DE9E-48BC-8FD2-C7CB3617AD50@acm.org> (raw)
In-Reply-To: <CADwFkm=59cqfUCPZNUCvoWv-M-9Jdosq-jQvY7X6MCWKiYGy_Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

25 nov. 2020 kl. 22.39 skrev Stefan Kangas <stefankangas@gmail.com>:

> I personally worry about the performance here.  Since we use regexps
> heavily all over, it is not clear (to me) that 10 % overall performance
> drop with subexpressions is worth it to work correctly in these rare
> edge-cases.  I suppose we do have to fix the bug here, but is it
> feasible to solve this in a way that has less performance impact?

We can't really let it remain buggy, especially as the consequence can be an error or silently wrong results. Also remember that one man's edge case is another's reasonable use.

However, unlike Boris we can eat our cake and have it! The attached patch performs the match-data translation in a C function, which obviously is much faster and indeed speeds up replace-regexp-in-string in all cases (as long as there is any match at all). The new primitive is a bit ad-hoc, but does one well-defined thing and isn't intended for use by the general public anyway.


[-- Attachment #2: 0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch --]
[-- Type: application/octet-stream, Size: 3894 bytes --]

From 88d5a8d847045e23c2ab39786dc6e5a9a5412a32 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 25 Nov 2020 15:32:08 +0100
Subject: [PATCH] Fix replace-regexp-in-string substring match data translation

For certain patterns, re-matching the same regexp on the matched
substring does not produce correctly translated match data
(bug#15107 and bug#44861).

Using a new builtin function also improves performance since the
number of calls to string-match is halved.

Reported by Kevin Ryde and Shigeru Fukaya.

* lisp/subr.el (replace-regexp-in-string): Translate the match data
using match-data--translate instead of trusting a call to string-match
on the matched string to do the job.
* test/lisp/subr-tests.el (subr-replace-regexp-in-string):
Add test cases.
* src/search.c (Fmatch_data__translate): New internal function.
(syms_of_search): Register it as a subroutine.
---
 lisp/subr.el            |  7 +++----
 src/search.c            | 18 ++++++++++++++++++
 test/lisp/subr-tests.el |  6 +++++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index 1fb0f9ab7e..e009dcc2b9 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -4546,10 +4546,9 @@ replace-regexp-in-string
 	(when (= me mb) (setq me (min l (1+ mb))))
 	;; Generate a replacement for the matched substring.
 	;; Operate on only the substring to minimize string consing.
-	;; Set up match data for the substring for replacement;
-	;; presumably this is likely to be faster than munging the
-	;; match data directly in Lisp.
-	(string-match regexp (setq str (substring string mb me)))
+        ;; Translate the match data so that it applies to the matched substring.
+        (match-data--translate (- mb))
+        (setq str (substring string mb me))
 	(setq matches
 	      (cons (replace-match (if (stringp rep)
 				       rep
diff --git a/src/search.c b/src/search.c
index e7f9094946..4eb634a3c0 100644
--- a/src/search.c
+++ b/src/search.c
@@ -3031,6 +3031,23 @@ DEFUN ("set-match-data", Fset_match_data, Sset_match_data, 1, 2, 0,
   return Qnil;
 }
 
+DEFUN ("match-data--translate", Fmatch_data__translate, Smatch_data__translate,
+       1, 1, 0,
+       doc: /* Add N to all string positions in the match data.  Internal.  */)
+  (Lisp_Object n)
+{
+  CHECK_FIXNUM (n);
+  EMACS_INT delta = XFIXNUM (n);
+  if (EQ (last_thing_searched, Qt))   /* String match data only.  */
+    for (ptrdiff_t i = 0; i < search_regs.num_regs; i++)
+      if (search_regs.start[i] >= 0)
+        {
+          search_regs.start[i] = max (0, search_regs.start[i] + delta);
+          search_regs.end[i] = max (0, search_regs.end[i] + delta);
+        }
+  return Qnil;
+}
+
 /* Called from Flooking_at, Fstring_match, search_buffer, Fstore_match_data
    if asynchronous code (filter or sentinel) is running. */
 static void
@@ -3388,6 +3405,7 @@ syms_of_search (void)
   defsubr (&Smatch_end);
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
+  defsubr (&Smatch_data__translate);
   defsubr (&Sregexp_quote);
   defsubr (&Snewline_cache_check);
 
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el
index c77be511dc..67f7fc9749 100644
--- a/test/lisp/subr-tests.el
+++ b/test/lisp/subr-tests.el
@@ -545,7 +545,11 @@ subr-replace-regexp-in-string
                             (match-beginning 1) (match-end 1)))
                   "babbcaacabc")
                  "b<abbc,0,4,1,3>a<ac,0,2,1,1><abc,0,3,1,2>"))
-  )
+  ;; anchors (bug#15107, bug#44861)
+  (should (equal (replace-regexp-in-string "a\\B" "b" "a aaaa")
+                 "a bbba"))
+  (should (equal (replace-regexp-in-string "\\`\\|x" "z" "--xx--")
+                 "z--zz--")))
 
 (provide 'subr-tests)
 ;;; subr-tests.el ends here
-- 
2.21.1 (Apple Git-122.3)


  reply	other threads:[~2020-11-26 12:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25  4:02 bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string' Shigeru Fukaya
2020-11-25 10:58 ` Mattias Engdegård
2020-11-25 14:58   ` Mattias Engdegård
2020-11-25 21:39     ` Stefan Kangas
2020-11-26 12:57       ` Mattias Engdegård [this message]
2020-11-26 13:12         ` Lars Ingebrigtsen
2020-11-26 13:39           ` Mattias Engdegård
2020-11-26 14:03             ` Lars Ingebrigtsen
2020-11-26 14:54               ` Mattias Engdegård
2020-11-29 13:28               ` Basil L. Contovounesios
2020-11-26 13:43           ` Stefan Kangas
2020-11-26 14:03             ` Lars Ingebrigtsen
2020-11-26 14:41             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83EC926B-DE9E-48BC-8FD2-C7CB3617AD50@acm.org \
    --to=mattiase@acm.org \
    --cc=44861@debbugs.gnu.org \
    --cc=shigeru.fukaya@gmail.com \
    --cc=stefankangas@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).