unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value
@ 2019-06-25 12:01 Mattias Engdegård
  2019-06-25 15:26 ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-25 12:01 UTC (permalink / raw)
  To: 36372

From the doc string and the manual, the call

 (replace-regexp-in-string "a" "X" "abcab" t t nil 2)

would be expected to return

 "abcXb"

but the actual return value is

 "cXb"

This was probably not intended. The manual text is

 This function copies STRING and searches it for matches for REGEXP,
 and replaces them with REP.  It returns the modified copy.  If
 START is non-‘nil’, the search for matches starts at that index in
 STRING, so matches starting before that index are not changed.

The question is whether it is too late to fix the bug, or if it needs to be documented.

`string-match' + `replace-match' work as expected:

(let ((s "abcab"))
  (string-match "a" s 2)
  (replace-match "X" t t s))
=> "abcXb"

Bug#15107 is somewhat related.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value
  2019-06-25 12:01 bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value Mattias Engdegård
@ 2019-06-25 15:26 ` Eli Zaretskii
  2019-06-26  9:31   ` bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH] Mattias Engdegård
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-06-25 15:26 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Tue, 25 Jun 2019 14:01:49 +0200
> 
>  (replace-regexp-in-string "a" "X" "abcab" t t nil 2)
> 
> would be expected to return
> 
>  "abcXb"
> 
> but the actual return value is
> 
>  "cXb"
> 
> This was probably not intended. The manual text is
> 
>  This function copies STRING and searches it for matches for REGEXP,
>  and replaces them with REP.  It returns the modified copy.  If
>  START is non-‘nil’, the search for matches starts at that index in
>  STRING, so matches starting before that index are not changed.
> 
> The question is whether it is too late to fix the bug, or if it needs to be documented.

Both, I think ;-)

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-25 15:26 ` Eli Zaretskii
@ 2019-06-26  9:31   ` Mattias Engdegård
  2019-06-26  9:38     ` Robert Pluim
  2019-06-26 15:17     ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-26  9:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 36372

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

25 juni 2019 kl. 17.26 skrev Eli Zaretskii <eliz@gnu.org>:
> 
>> The question is whether it is too late to fix the bug, or if it needs to be documented.
> 
> Both, I think ;-)

Documenting a bug rather than fixing it never feels very satisfactory, but it may be the lesser evil.
What about this patch?


[-- Attachment #2: 0001-Document-bug-in-replace-regexp-in-string.patch --]
[-- Type: application/octet-stream, Size: 2184 bytes --]

From d5419cbf491cccb91843b8e3d6270df1c40a9004 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 26 Jun 2019 11:23:32 +0200
Subject: [PATCH] Document bug in `replace-regexp-in-string'

`replace-regexp-in-string' omits the first START characters of the
input string in its return value.  This is a clear bug, but fixing it
probably causes more trouble; document the behaviour instead (bug#36372).

* doc/lispref/searching.texi (Search and Replace)
* lisp/subr.el (replace-regexp-in-string):
Document current behaviour.
---
 doc/lispref/searching.texi | 4 ++--
 lisp/subr.el               | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 33455114da..86a907021c 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1790,8 +1790,8 @@ Search and Replace
 This function copies @var{string} and searches it for matches for
 @var{regexp}, and replaces them with @var{rep}.  It returns the
 modified copy.  If @var{start} is non-@code{nil}, the search for
-matches starts at that index in @var{string}, so matches starting
-before that index are not changed.
+matches starts at that index in @var{string}, and the returned value
+does not include first @var{start} characters of @var{string}.
 
 This function uses @code{replace-match} to do the replacement, and it
 passes the optional arguments @var{fixedcase}, @var{literal} and
diff --git a/lisp/subr.el b/lisp/subr.el
index baff1e909a..ba388b3721 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -4208,7 +4208,8 @@ replace-regexp-in-string
 
 Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
 arguments with the same names of function `replace-match'.  If START
-is non-nil, start replacements at that index in STRING.
+is non-nil, start replacements at that index in STRING.  The return
+value then omits the first START characters of STRING.
 
 REP is either a string used as the NEWTEXT arg of `replace-match' or a
 function.  If it is a function, it is called with the actual text of each
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26  9:31   ` bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH] Mattias Engdegård
@ 2019-06-26  9:38     ` Robert Pluim
  2019-06-26 10:01       ` Mattias Engdegård
  2019-06-26 10:22       ` Lars Ingebrigtsen
  2019-06-26 15:17     ` Eli Zaretskii
  1 sibling, 2 replies; 17+ messages in thread
From: Robert Pluim @ 2019-06-26  9:38 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

>>>>> On Wed, 26 Jun 2019 11:31:01 +0200, Mattias Engdegård <mattiase@acm.org> said:

    Mattias> 25 juni 2019 kl. 17.26 skrev Eli Zaretskii <eliz@gnu.org>:
    >> 
    >>> The question is whether it is too late to fix the bug, or if it needs to be documented.
    >> 
    >> Both, I think ;-)

    Mattias> Documenting a bug rather than fixing it never feels very satisfactory, but it may be the lesser evil.
    Mattias> What about this patch?

I think this is one of those 'who could possibly be relying on this
behaviour' bugs that bites you hard if you change things, so
documentation is probably best. Minor nits below

    Mattias> From d5419cbf491cccb91843b8e3d6270df1c40a9004 Mon Sep 17 00:00:00 2001
    Mattias> From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
    Mattias> Date: Wed, 26 Jun 2019 11:23:32 +0200
    Mattias> Subject: [PATCH] Document bug in `replace-regexp-in-string'

    Mattias> `replace-regexp-in-string' omits the first START characters of the
    Mattias> input string in its return value.  This is a clear bug, but fixing it
    Mattias> probably causes more trouble; document the behaviour instead (bug#36372).

    Mattias> * doc/lispref/searching.texi (Search and Replace)
    Mattias> * lisp/subr.el (replace-regexp-in-string):
    Mattias> Document current behaviour.
    Mattias> ---
    Mattias>  doc/lispref/searching.texi | 4 ++--
    Mattias>  lisp/subr.el               | 3 ++-
    Mattias>  2 files changed, 4 insertions(+), 3 deletions(-)

    Mattias> diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
    Mattias> index 33455114da..86a907021c 100644
    Mattias> --- a/doc/lispref/searching.texi
    Mattias> +++ b/doc/lispref/searching.texi
    Mattias> @@ -1790,8 +1790,8 @@ Search and Replace
    Mattias>  This function copies @var{string} and searches it for matches for
    Mattias>  @var{regexp}, and replaces them with @var{rep}.  It returns the
    Mattias>  modified copy.  If @var{start} is non-@code{nil}, the search for
    Mattias> -matches starts at that index in @var{string}, so matches starting
    Mattias> -before that index are not changed.
    Mattias> +matches starts at that index in @var{string}, and the returned value
    Mattias> +does not include first @var{start} characters of @var{string}.

insert 'the' before 'first'

    Mattias>  This function uses @code{replace-match} to do the replacement, and it
    Mattias>  passes the optional arguments @var{fixedcase}, @var{literal} and
    Mattias> diff --git a/lisp/subr.el b/lisp/subr.el
    Mattias> index baff1e909a..ba388b3721 100644
    Mattias> --- a/lisp/subr.el
    Mattias> +++ b/lisp/subr.el
    Mattias> @@ -4208,7 +4208,8 @@ replace-regexp-in-string
 
    Mattias>  Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
    Mattias>  arguments with the same names of function `replace-match'.  If START
    Mattias> -is non-nil, start replacements at that index in STRING.
    Mattias> +is non-nil, start replacements at that index in STRING.  The return
    Mattias> +value then omits the first START characters of STRING.
 
'at that index in STRING, and omit the first START characters from the
return value.'

    Mattias>  REP is either a string used as the NEWTEXT arg of `replace-match' or a
    Mattias>  function.  If it is a function, it is called with the actual text of each
    Mattias> -- 
    Mattias> 2.20.1 (Apple Git-117)







^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26  9:38     ` Robert Pluim
@ 2019-06-26 10:01       ` Mattias Engdegård
  2019-06-26 11:11         ` Robert Pluim
  2019-06-26 10:22       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-26 10:01 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 36372

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

26 juni 2019 kl. 11.38 skrev Robert Pluim <rpluim@gmail.com>:
> 
> insert 'the' before 'first'

Oh dear, thank you for spotting that.

>    Mattias>  Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
>    Mattias>  arguments with the same names of function `replace-match'.  If START
>    Mattias> -is non-nil, start replacements at that index in STRING.
>    Mattias> +is non-nil, start replacements at that index in STRING.  The return
>    Mattias> +value then omits the first START characters of STRING.
> 
> 'at that index in STRING, and omit the first START characters from the
> return value.'

Thank you, but that would approach circular reasoning (from what return value, exactly? The one that we are in the process of defining?). I went with

[...] If START is non-nil, start replacements at that index in STRING, and omit the first START characters of STRING from the return value.

in the revised patch below.

[-- Attachment #2: 0001-Document-bug-in-replace-regexp-in-string.patch --]
[-- Type: application/octet-stream, Size: 2190 bytes --]

From 4fea4c96868985690c7db363bf29fac8beff19ff Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 26 Jun 2019 11:23:32 +0200
Subject: [PATCH] Document bug in `replace-regexp-in-string'

`replace-regexp-in-string' omits the first START characters of the
input string in its return value.  This is a clear bug, but fixing it
probably causes more trouble; document the behaviour instead (bug#36372).

* doc/lispref/searching.texi (Search and Replace)
* lisp/subr.el (replace-regexp-in-string):
Document current behaviour.
---
 doc/lispref/searching.texi | 4 ++--
 lisp/subr.el               | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 33455114da..b82f5dade0 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1790,8 +1790,8 @@ Search and Replace
 This function copies @var{string} and searches it for matches for
 @var{regexp}, and replaces them with @var{rep}.  It returns the
 modified copy.  If @var{start} is non-@code{nil}, the search for
-matches starts at that index in @var{string}, so matches starting
-before that index are not changed.
+matches starts at that index in @var{string}, and the returned value
+does not include the first @var{start} characters of @var{string}.
 
 This function uses @code{replace-match} to do the replacement, and it
 passes the optional arguments @var{fixedcase}, @var{literal} and
diff --git a/lisp/subr.el b/lisp/subr.el
index baff1e909a..b981af6afe 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -4208,7 +4208,8 @@ replace-regexp-in-string
 
 Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
 arguments with the same names of function `replace-match'.  If START
-is non-nil, start replacements at that index in STRING.
+is non-nil, start replacements at that index in STRING, and omit
+the first START characters of STRING from the return value.
 
 REP is either a string used as the NEWTEXT arg of `replace-match' or a
 function.  If it is a function, it is called with the actual text of each
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26  9:38     ` Robert Pluim
  2019-06-26 10:01       ` Mattias Engdegård
@ 2019-06-26 10:22       ` Lars Ingebrigtsen
  2019-06-26 12:32         ` Robert Pluim
  1 sibling, 1 reply; 17+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-26 10:22 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

Robert Pluim <rpluim@gmail.com> writes:

> I think this is one of those 'who could possibly be relying on this
> behaviour' bugs that bites you hard if you change things, so
> documentation is probably best.

Yup.  But this is very unusual behaviour for such a function, so even if
documented, it's going to cause confusion...

Has anybody grepped through the Emacs tree to see whether anybody uses
the parameter, and if so, whether it's expecting the wrong behaviour?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 10:01       ` Mattias Engdegård
@ 2019-06-26 11:11         ` Robert Pluim
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Pluim @ 2019-06-26 11:11 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

>>>>> On Wed, 26 Jun 2019 12:01:45 +0200, Mattias Engdegård <mattiase@acm.org> said:
    >> 'at that index in STRING, and omit the first START characters from the
    >> return value.'

    Mattias> Thank you, but that would approach circular reasoning (from what return value, exactly? The one that we are in the process of defining?). I went with

    Mattias> [...] If START is non-nil, start replacements at that index in STRING, and omit the first START characters of STRING from the return value.

Sure, that works.

Robert





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 10:22       ` Lars Ingebrigtsen
@ 2019-06-26 12:32         ` Robert Pluim
  2019-06-26 13:51           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 17+ messages in thread
From: Robert Pluim @ 2019-06-26 12:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, 36372

>>>>> On Wed, 26 Jun 2019 12:22:06 +0200, Lars Ingebrigtsen <larsi@gnus.org> said:

    Larsi> Robert Pluim <rpluim@gmail.com> writes:
    >> I think this is one of those 'who could possibly be relying on this
    >> behaviour' bugs that bites you hard if you change things, so
    >> documentation is probably best.

    Larsi> Yup.  But this is very unusual behaviour for such a function, so even if
    Larsi> documented, it's going to cause confusion...

    Larsi> Has anybody grepped through the Emacs tree to see whether anybody uses
    Larsi> the parameter, and if so, whether it's expecting the wrong behaviour?

'grep'? Sounds complicated. I ran a combination of find-dired and
diredp-do-apply-function over the lisp sources, and none of the 656
calls to replace-regexp-in-string have 7 arguments.

Robert





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 12:32         ` Robert Pluim
@ 2019-06-26 13:51           ` Lars Ingebrigtsen
  2019-06-26 14:01             ` Drew Adams
  2019-06-26 14:51             ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-26 13:51 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

Robert Pluim <rpluim@gmail.com> writes:

> 'grep'? Sounds complicated. I ran a combination of find-dired and
> diredp-do-apply-function over the lisp sources, and none of the 656
> calls to replace-regexp-in-string have 7 arguments.

I started thinking about adding a check in the byte compiler for this,
but now you've checked anyway.  :-)

So it isn't used in-tree at all.  Since it's an (in my opinion) pretty
useless parameter as it's currently implemented, I think we should
deprecate the parameter and remove the documentation.

Then, perhaps, in ten years time we can reintroduce the parameter with
better semantics.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 13:51           ` Lars Ingebrigtsen
@ 2019-06-26 14:01             ` Drew Adams
  2019-06-26 14:51             ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Drew Adams @ 2019-06-26 14:01 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Mattias Engdegård; +Cc: 36372

> So it isn't used in-tree at all.  Since it's an (in my opinion) pretty
> useless parameter as it's currently implemented, I think we should
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> deprecate the parameter and remove the documentation.
> 
> Then, perhaps, in ten years time we can reintroduce the parameter with
> better semantics.  :-)

1. Whether it's used in-tree is not sufficient.

2. Whether it is useless "as it's currently
   implemented" is not sufficient.  That's what
   implementation bugs are about.

3. But I agree that it is likely very rarely used,
   if at all.  (Searched my code and found no
   matches, as one example.)

3. IF it has a use, when fixed as you see fit,
   then fix it, and call out the fixed behavior
   as an incompatible change.  In that case, do
   not deprecate it.  (I don't know that it has
   a use, if fixed, hence "IF".)





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 13:51           ` Lars Ingebrigtsen
  2019-06-26 14:01             ` Drew Adams
@ 2019-06-26 14:51             ` Eli Zaretskii
  2019-06-26 15:20               ` Mattias Engdegård
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-06-26 14:51 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: mattiase, 36372

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 26 Jun 2019 15:51:41 +0200
> Cc: 36372@debbugs.gnu.org
> 
> So it isn't used in-tree at all.  Since it's an (in my opinion) pretty
> useless parameter as it's currently implemented, I think we should
> deprecate the parameter and remove the documentation.

I disagree.  This has been in Emacs since 2000, it's too late to
remove it.  We must assume that whoever wrote the code (which operated
like that from the get-go) actually meant for it to operate like that.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26  9:31   ` bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH] Mattias Engdegård
  2019-06-26  9:38     ` Robert Pluim
@ 2019-06-26 15:17     ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2019-06-26 15:17 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 36372

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 26 Jun 2019 11:31:01 +0200
> Cc: 36372@debbugs.gnu.org
> 
> diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
> index 33455114da..86a907021c 100644
> --- a/doc/lispref/searching.texi
> +++ b/doc/lispref/searching.texi
> @@ -1790,8 +1790,8 @@ Search and Replace
>  This function copies @var{string} and searches it for matches for
>  @var{regexp}, and replaces them with @var{rep}.  It returns the
>  modified copy.  If @var{start} is non-@code{nil}, the search for
> -matches starts at that index in @var{string}, so matches starting
> -before that index are not changed.
> +matches starts at that index in @var{string}, and the returned value
> +does not include first @var{start} characters of @var{string}.

For the manual, I'd expand a bit on this, and explained how to get the
result you thought you will when using non-nil START (by concatenating
with the initial substring).

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 14:51             ` Eli Zaretskii
@ 2019-06-26 15:20               ` Mattias Engdegård
  2019-06-26 15:59                 ` Clément Pit-Claudel
  0 siblings, 1 reply; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-26 15:20 UTC (permalink / raw)
  To: Eli Zaretskii, Clément Pit-Claudel; +Cc: Lars Ingebrigtsen, 36372

26 juni 2019 kl. 16.51 skrev Eli Zaretskii <eliz@gnu.org>:
>> 
>> So it isn't used in-tree at all.  Since it's an (in my opinion) pretty
>> useless parameter as it's currently implemented, I think we should
>> deprecate the parameter and remove the documentation.
> 
> I disagree.  This has been in Emacs since 2000, it's too late to
> remove it.  We must assume that whoever wrote the code (which operated
> like that from the get-go) actually meant for it to operate like that.

It may be too late to change, but that doesn't mean that we have to agree with your last statement: it looks like a mistake from the very beginning to me. Given the seventh argument's infrequency of use (and the lack of tests), this is perfectly plausible.

By the way, I found no use of the START argument in GNU ELPA either so I started searching all elisp code on my disk, and found exactly one, in company-coq.

Clément, apologies for dragging you into the discussion, but when you woke up this morning, you probably didn't know that you were possibly the only man in history to use the last argument to replace-regexp-in-string.
Now I would be curious to know:

(1) Does this code, in company-coq--loc-fully-qualified-name, actually work the way you intended? (Looks like it.)
(2) Did you learn how the START parameter affects the return value by reading the doc string, the manual, the source code, or by testing?
(3) Are you aware of other code using the START parameter to replace-regexp-in-string?

Clearly this is anecdotal evidence but we have little else.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 15:20               ` Mattias Engdegård
@ 2019-06-26 15:59                 ` Clément Pit-Claudel
  2019-06-26 17:03                   ` Mattias Engdegård
  0 siblings, 1 reply; 17+ messages in thread
From: Clément Pit-Claudel @ 2019-06-26 15:59 UTC (permalink / raw)
  To: Mattias Engdegård, Eli Zaretskii; +Cc: Lars Ingebrigtsen, 36372

On 2019-06-26 11:20, Mattias Engdegård wrote:
> Clément, apologies for dragging you into the discussion, but when you woke up this morning, you probably didn't know that you were possibly the only man in history to use the last argument to replace-regexp-in-string.

Well, my morning just got a lot more interesting :)

> Now I would be curious to know:
> 
> (1) Does this code, in company-coq--loc-fully-qualified-name, actually work the way you intended? (Looks like it.)

Yes, I think it does.  It's a bit inscrutable, so here's an annotated copy:

(defun company-coq--loc-fully-qualified-name (fqn)
  ;; assume fqn is Coq.Init.Nat.max
  (let* ((spec (company-coq-longest-matching-path-spec fqn))
         ;; … spec is ("Coq.Init" . "/build/coq/theories/Init/")
         (logical (if spec (concat (car spec) ".") ""))
         ;; … logical is "Coq.Init."
         (mod-name (replace-regexp-in-string "\\..*\\'" "" fqn nil nil nil (length logical))))
         ;; … mod-name is "Nat"
    (company-coq-library-path logical mod-name spec)))
    ;; … return /build/coq/theories/Init/Nat.v

The reason for this odd dance is that a Coq identifier is essentially a directory path (Coq.Init.) + a file name (here Nat.) + an identifier name (max).  But in some cases there can be modules in the path too, like maybe Coq.Init.Nat.XYZ.max, and in those cases the file is still only 'Nat.' 

This code would probably be clearer if I just used a substring to trim out the beginning of the string ^^

> (2) Did you learn how the START parameter affects the return value by reading the doc string, the manual, the source code, or by testing?

Almost certainly the docstring, confirmed by testing.  I haven't looked at this section of the manual.  

> (3) Are you aware of other code using the START parameter to replace-regexp-in-string?

Nothing off the top of my head, but I probably wouldn't have thought of the instance in company-coq either.  A quick grep through my local sources doesn't turn up anything else, but a quick visual check isn't a very reliable method (did you use a plain grep to find the instance in company-coq, or do you have a more sophisticated trick?).

Hope this helps,
Clément.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 15:59                 ` Clément Pit-Claudel
@ 2019-06-26 17:03                   ` Mattias Engdegård
  2019-06-26 17:09                     ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-26 17:03 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: Lars Ingebrigtsen, 36372

[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]

26 juni 2019 kl. 17.59 skrev Clément Pit-Claudel <cpitclaudel@gmail.com>:
> 
>> (1) Does this code, in company-coq--loc-fully-qualified-name, actually work the way you intended? (Looks like it.)
> 
> Yes, I think it does.  It's a bit inscrutable, so here's an annotated copy:

Thanks, so at least it wasn't written blindly from the docs --- good to know.

> This code would probably be clearer if I just used a substring to trim out the beginning of the string ^^

Well I thought so, but who am I to judge? If it works...

>> (2) Did you learn how the START parameter affects the return value by reading the doc string, the manual, the source code, or by testing?
> 
> Almost certainly the docstring, confirmed by testing.  I haven't looked at this section of the manual. 
> 
>> (3) Are you aware of other code using the START parameter to replace-regexp-in-string?
> 
> Nothing off the top of my head, but I probably wouldn't have thought of the instance in company-coq either.  A quick grep through my local sources doesn't turn up anything else, but a quick visual check isn't a very reliable method (did you use a plain grep to find the instance in company-coq, or do you have a more sophisticated trick?).

Definitely not sophisticated, just re-purposed my old regexp trawler (attached in case you want to try).

Clément, thank you very much for humouring me. We are in the unenviable position to decide whether to fix an old, unsatisfactory, apparently very rarely used interface, or to document the behaviour. The stakes are low, but it looks like the semantics will be kept after all.

26 juni 2019 kl. 17.17 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> For the manual, I'd expand a bit on this, and explained how to get the
> result you thought you will when using non-nil START (by concatenating
> with the initial substring).

All right, I added a sentence to that effect.


[-- Attachment #2: 0001-Document-bug-in-replace-regexp-in-string.patch --]
[-- Type: application/octet-stream, Size: 2318 bytes --]

From 95b5c2f463edfe617f16a238081639ac58f6e7fb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 26 Jun 2019 11:23:32 +0200
Subject: [PATCH] Document bug in `replace-regexp-in-string'

`replace-regexp-in-string' omits the first START characters of the
input string in its return value.  This is a clear bug, but fixing it
probably causes more trouble; document the behaviour instead (bug#36372).

* doc/lispref/searching.texi (Search and Replace)
* lisp/subr.el (replace-regexp-in-string):
Document current behaviour.
---
 doc/lispref/searching.texi | 6 ++++--
 lisp/subr.el               | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 33455114da..ef1cffc446 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1790,8 +1790,10 @@ Search and Replace
 This function copies @var{string} and searches it for matches for
 @var{regexp}, and replaces them with @var{rep}.  It returns the
 modified copy.  If @var{start} is non-@code{nil}, the search for
-matches starts at that index in @var{string}, so matches starting
-before that index are not changed.
+matches starts at that index in @var{string}, and the returned value
+does not include the first @var{start} characters of @var{string}.
+To get the whole transformed string, concatenate the first
+@var{start} characters of @var{string} with the return value.
 
 This function uses @code{replace-match} to do the replacement, and it
 passes the optional arguments @var{fixedcase}, @var{literal} and
diff --git a/lisp/subr.el b/lisp/subr.el
index baff1e909a..b981af6afe 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -4208,7 +4208,8 @@ replace-regexp-in-string
 
 Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
 arguments with the same names of function `replace-match'.  If START
-is non-nil, start replacements at that index in STRING.
+is non-nil, start replacements at that index in STRING, and omit
+the first START characters of STRING from the return value.
 
 REP is either a string used as the NEWTEXT arg of `replace-match' or a
 function.  If it is a function, it is called with the actual text of each
-- 
2.20.1 (Apple Git-117)


[-- Attachment #3: trawl.el --]
[-- Type: application/octet-stream, Size: 9938 bytes --]

;;; trawl.el -- scan elisp code for various things   -*- lexical-binding: t -*-

(require 'compile)
(require 'cl-lib)

(defconst trawl--error-buffer-name "*trawl*")

(defun trawl--error-buffer ()
  (let ((buf (get-buffer trawl--error-buffer-name)))
    (or buf
        (let ((buf (get-buffer-create trawl--error-buffer-name)))
          (with-current-buffer buf
            (trawl-mode))
          buf))))

(defvar trawl--error-count)

(defun trawl--add-to-error-buffer (string)
  (with-current-buffer (trawl--error-buffer)
    (goto-char (point-max))
    (let ((inhibit-read-only t))
      (insert string))))

(defun trawl--line-col-from-pos-path (pos path)
  "Compute (LINE . COLUMN) from POS (toplevel position)
and PATH (reversed list of list indices to follow to target)."
  (save-excursion
    (goto-char pos)
    (let ((p (reverse path)))
      (while p
        (when (looking-at (rx (1+ (or blank "\n" "\f"
                                      (seq ";" (0+ nonl))))))
          (goto-char (match-end 0)))
        (let ((skip (car p)))
          (cond
           ((looking-at (rx (or "'" "#'" "`" "," ",@")))
            (goto-char (match-end 0))
            (setq skip (1- skip)))
           ((looking-at (rx "("))
            (forward-char 1)))
          (forward-sexp skip)
          (setq p (cdr p))))
      (when (looking-at (rx (1+ (or blank "\n" "\f"
                                    (seq ";" (0+ nonl))))))
        (goto-char (match-end 0)))
      (cons (line-number-at-pos (point) t)
            (1+ (current-column))))))

(defun trawl--output-error (string)
  (if noninteractive
      (message "%s" string)
    (trawl--add-to-error-buffer (concat string "\n"))))

(defun trawl--report (file pos path message)
  (let ((line-col (trawl--line-col-from-pos-path pos path)))
    (trawl--output-error
     (format "%s:%d:%d: %s" file (car line-col) (cdr line-col) message)))
  (setq trawl--error-count (1+ trawl--error-count)))

(defun trawl--quote-string (str)
  (concat "\""
          (replace-regexp-in-string
           (rx (any cntrl "\177-\377" ?\\ ?\"))
           (lambda (s)
             (let ((c (logand (string-to-char s) #xff)))
               (or (cdr (assq c
                              '((?\" . "\\\"")
                                (?\\ . "\\\\")
                                (?\b . "\\b")
                                (?\t . "\\t")
                                (?\n . "\\n")
                                (?\v . "\\v")
                                (?\f . "\\f")
                                (?\r . "\\r")
                                (?\e . "\\e"))))
                   (format "\\%03o" c))))
           str t t)
          "\""))

(defun trawl--caret-string (string pos)
  (let ((quoted-pos
         (- (length (trawl--quote-string (substring string 0 pos)))
            2)))                        ; Lop off quotes
    (concat (make-string quoted-pos ?.) "^")))


(defconst trawl--report-could-be nil)
(defconst trawl--report-pcase nil)
(defconst trawl--report-cl-case nil)

(defun trawl--check-form-recursively (form file pos path)
  (pcase form
    (`(replace-regexp-in-string . ,_)
     (when (> (length form) 7)
       (trawl--report file pos path "replace-regexp-in-string with start")))

;;;    (`(apply ,(or 'nconc '(quote nconc) '(function nconc)) (mapcar . ,_))
;;;     (trawl--report file pos path
;;;                    "use mapcan instead of (apply nconc (mapcar...))"))
;;;    (`(lambda (,var1) (,_ ,var2))
;;;     (when (eq var1 var2)
;;;       (trawl--report file pos path
;;;                      "lambda expression can be η-reduced")))
;;;    (`(lambda (,var1) ,var2)
;;;     (when (eq var1 var2)
;;;       (trawl--report file pos path
;;;		      "lambda expression is #'identity")))
;;;    (`(,(or 'defun 'defsubst) ,name ,_ . ,body)
;;;     (let ((f body))
;;;       (while (and f (consp (car f)) (eq (caar f) 'declare))
;;;         (setq f (cdr f)))
;;;       (when (and f (consp (car f)))
;;;         (setq f (cdr f))
;;;         (while (cdr f)
;;;           (when (stringp (car f))
;;;             (trawl--report file pos path
;;;			    (format "defun %s: misplaced doc string" name)))
;;;           (setq f (cdr f))))))
                 )

                (let ((index 0))
                  (while (consp form)
                    (when (consp (car form))
                      (trawl--check-form-recursively
                       (car form) file pos (cons index path)))
                    (setq form (cdr form))
                    (setq index (1+ index)))))

(defun trawl--show-errors ()
  (unless noninteractive
    (let ((pop-up-windows t))
      (display-buffer (trawl--error-buffer))
      (sit-for 0))))

(defun trawl--read-buffer (file)
  "Read top-level forms from the current buffer.
Return a list of (FORM . STARTING-POSITION)."
  (goto-char (point-min))
  (let ((pos nil)
        (keep-going t)
        (read-circle nil)
        (forms nil))
    (while keep-going
      (setq pos (point))
      (let ((form nil))
        (condition-case err
            (setq form (read (current-buffer)))
          (end-of-file
           (setq keep-going nil))
          (invalid-read-syntax
           (cond
            ((equal (cadr err) "#")
             (goto-char pos)
             (forward-sexp 1))
            (t
             (trawl--report file (point) nil (prin1-to-string err))
             (setq keep-going nil))))
          (error
           (trawl--report file (point) nil (prin1-to-string err))
           (setq keep-going nil)))
        (when (consp form)
          (push (cons form pos) forms))))
    (nreverse forms)))

(defun trawl--scan-current-buffer (file)
  (let ((errors-before trawl--error-count))
    (let ((forms (trawl--read-buffer file))
          (case-fold-search nil))
      (dolist (form forms)
        (trawl--check-form-recursively (car form) file (cdr form) nil)))
    (when (> trawl--error-count errors-before)
      (trawl--show-errors))))

(defun trawl--scan-file (file base-dir)
  (with-temp-buffer
    (emacs-lisp-mode)
    (insert-file-contents file)
    (trawl--scan-current-buffer (file-relative-name file base-dir))))

(defvar trawl-last-target nil
  "The last file, directory or buffer on which trawl was run.")

(defun trawl--init (target base-dir)
  (if noninteractive
      (setq trawl--error-count 0)
    (with-current-buffer (trawl--error-buffer)
      (let ((inhibit-read-only t))
        (compilation-forget-errors)
        (erase-buffer)
        (insert (format "Trawl results for %s\n" target))
        (trawl--show-errors))
      (setq trawl-last-target target)
      (setq default-directory base-dir)
      (setq trawl--error-count 0))))

(defun trawl--finish ()
  (let* ((errors trawl--error-count)
         (msg (format "%d error%s" errors (if (= errors 1) "" "s"))))
    (unless noninteractive
      (trawl--add-to-error-buffer (format "\nFinished -- %s found.\n" msg)))
    (message "trawl: %s found." msg)))

(defun trawl-again ()
  "Re-run trawl on the same file, directory or buffer as last time."
  (interactive)
  (cond ((bufferp trawl-last-target)
         (with-current-buffer trawl-last-target
           (trawl-current-buffer)))
        ((file-directory-p trawl-last-target)
         (trawl-directory trawl-last-target))
        ((file-readable-p trawl-last-target)
         (trawl-file trawl-last-target))
        (t (error "No target"))))

(defvar trawl-mode-map
  (let ((map (make-sparse-keymap)))
    (set-keymap-parent map compilation-minor-mode-map)
    (define-key map "n" 'next-error-no-select)
    (define-key map "p" 'previous-error-no-select)
    (define-key map "g" 'trawl-again)
    map)
  "Keymap for trawl buffers.")

(define-compilation-mode trawl-mode "Trawl"
  "Mode for trawl output."
  (setq-local trawl-last-target nil))

(defun trawl--scan-files (files target base-dir)
  (trawl--init target base-dir)
  (dolist (file files)
    ;;(trawl--add-to-error-buffer (format "Scanning %s\n" file))
    (trawl--scan-file file base-dir))
  (trawl--finish))

(defun trawl--tree-files (dir)
  (directory-files-recursively
   dir (rx bos (not (any ".")) (* anything) ".el" eos)))


;;;###autoload
(defun trawl-file (file)
  "Scan FILE, an elisp file."
  (interactive "fTrawl elisp file: ")
  (trawl--scan-files (list file) file (file-name-directory file)))

;;;###autoload
(defun trawl-directory (dir)
  "Scan all *.el files in DIR."
  (interactive "DTrawl directory: ")
  (message "Finding .el files in %s..." dir)
  (let ((files (trawl--tree-files dir)))
    (message "Scanning files...")
    (trawl--scan-files files dir dir)))

;;;###autoload
(defun trawl-current-buffer ()
  "Scan the current buffer.
The buffer must be in emacs-lisp-mode."
  (interactive)
  (unless (eq major-mode 'emacs-lisp-mode)
    (error "Trawl: can only scan elisp code (use emacs-lisp-mode)"))
  (trawl--init (current-buffer) default-directory)
  (save-excursion
    (trawl--scan-current-buffer (buffer-name)))
  (trawl--finish))


(defun trawl-batch ()
  "Scan elisp source files.
Call this function in batch mode with files and directories as
command-line arguments.  Files are scanned; directories are
searched recursively for *.el files to scan."
  (unless noninteractive
    (error "`trawl-batch' is only for use with -batch"))
  (trawl--scan-files (mapcan (lambda (arg)
                               (if (file-directory-p arg)
                                   (trawl--tree-files arg)
                                 (list arg)))
                             command-line-args-left)
                     nil default-directory)
  (setq command-line-args-left nil))

(provide 'trawl)

;;; trawl.el ends here

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 17:03                   ` Mattias Engdegård
@ 2019-06-26 17:09                     ` Eli Zaretskii
  2019-06-26 17:41                       ` Mattias Engdegård
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-06-26 17:09 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: cpitclaudel, larsi, 36372

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 26 Jun 2019 19:03:19 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, Lars Ingebrigtsen <larsi@gnus.org>,
>         36372@debbugs.gnu.org
> > For the manual, I'd expand a bit on this, and explained how to get the
> > result you thought you will when using non-nil START (by concatenating
> > with the initial substring).
> 
> All right, I added a sentence to that effect.

LGTM, thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH]
  2019-06-26 17:09                     ` Eli Zaretskii
@ 2019-06-26 17:41                       ` Mattias Engdegård
  0 siblings, 0 replies; 17+ messages in thread
From: Mattias Engdegård @ 2019-06-26 17:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Lars Ingebrigtsen, 36372-done

26 juni 2019 kl. 19.09 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> LGTM, thanks.

That's it then. Thanks for the review; pushed.






^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-06-26 17:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-25 12:01 bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value Mattias Engdegård
2019-06-25 15:26 ` Eli Zaretskii
2019-06-26  9:31   ` bug#36372: 27.0.50; replace-regexp-in-string skips START first chars in return value [PATCH] Mattias Engdegård
2019-06-26  9:38     ` Robert Pluim
2019-06-26 10:01       ` Mattias Engdegård
2019-06-26 11:11         ` Robert Pluim
2019-06-26 10:22       ` Lars Ingebrigtsen
2019-06-26 12:32         ` Robert Pluim
2019-06-26 13:51           ` Lars Ingebrigtsen
2019-06-26 14:01             ` Drew Adams
2019-06-26 14:51             ` Eli Zaretskii
2019-06-26 15:20               ` Mattias Engdegård
2019-06-26 15:59                 ` Clément Pit-Claudel
2019-06-26 17:03                   ` Mattias Engdegård
2019-06-26 17:09                     ` Eli Zaretskii
2019-06-26 17:41                       ` Mattias Engdegård
2019-06-26 15:17     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).