* `match-data' set improperly
@ 2003-02-12 3:16 Matthew Swift
2003-02-12 10:53 ` Kim F. Storm
[not found] ` <mailman.1845.1045043916.21513.bug-gnu-emacs@gnu.org>
0 siblings, 2 replies; 8+ messages in thread
From: Matthew Swift @ 2003-02-12 3:16 UTC (permalink / raw)
This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.
Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.
In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2002-11-06 on beth, modified by Debian
configured using `configure i386-debian-linux-gnu --prefix=/usr/local --sharedstatedir=/var/lib --libexecdir=/usr/local/lib --localstatedir=/var/lib --infodir=/usr/local/share/info --mandir=/usr/local/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif'
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: nil
locale-coding-system: nil
default-enable-multibyte-characters: t
Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:
Evaluating the following `let' form gives varying results, but the results
should be consistent. Correct results are always returned when `s' is "The
quick fox...", which matches the regexp `r'. When `s' is "great grey
green...", which does not match `r', the results in my trials have never been
correct: either an 'args-out-of-range' error is raised, or incorrect data is
given, which appears to be the result of applying match-data from matching a
previous value of `s' ("The quick fox...") to the current value of `s'. Sample
results are copied below.
My *guess* is that when `string-match' fails, it either fails to set
`match-data' or sets it incorrectly in one of two (or more) ways.
(let ((r "\\(qu\\)\\(ick\\)")
;; \C-x\C-t to swap next two lines
;; \M-\C-x to evaluate the `let' form
(s "The quick fox jumped quickly.")
(s "great grey green greasy Limpopo River")
)
(string-match r s)
(list
(match-data)
(match-string 0 s)
(match-string 1 s)
(match-string 2 s)
(match-string 3 s)
(match-beginning 0)
(match-beginning 1)
(match-beginning 2)
(match-beginning 3)
))
Debugger entered--Lisp error: (args-out-of-range "great grey green greasy Limpopo River" 27326 27327)
match-string(0 "great grey green greasy Limpopo River")
(list (match-data) (match-string 0 s) (match-string 1 s) (match-string 2 s) (match-string 3 s) (match-beginning 0) (match-beginning 1) (match-beginning 2) (match-beginning 3))
(let ((r "\\(qu\\)\\(ick\\)") (s "The quick fox jumped quickly.") (s "great grey green greasy Limpopo River")) (string-match r s) (list (match-data) (match-string 0 s) (match-string 1 s) (match-string 2 s) (match-string 3 s) (match-beginning 0) (match-beginning 1) (match-beginning 2) (match-beginning 3)))
eval((let ((r "\\(qu\\)\\(ick\\)") (s "The quick fox jumped quickly.") (s "great grey green greasy Limpopo River")) (string-match r s) (list (match-data) (match-string 0 s) (match-string 1 s) (match-string 2 s) (match-string 3 s) (match-beginning 0) (match-beginning 1) (match-beginning 2) (match-beginning 3))))
eval-last-sexp-1(nil)
eval-last-sexp(nil)
call-interactively(eval-last-sexp)
Sometimes I get:
((4 9 4 6 6 9) "t gre" "t " "gre" nil 4 4 6 nil)
I get this less often, and I cannot tell you a reliable way to get it. I
evaluate the `let' form, evaluate one or more individual lines
e.g. (match-beginning 2), swap the `s' strings, and evaluate the `let' form
again. Then most of the time I still get the error, but sometimes I get these
incorrect results.
Notice that the match-data is the same as for a successful match of the other
string:
((4 9 4 6 6 9) quick qu ick nil 4 4 6 nil)
Recent input:
<mouse-1> <down-mouse-5> <mouse-5> <double-down-mouse-5>
<double-mouse-5> <down-mouse-1> <mouse-1> C-a C-n <return>
M-1 C-n C-n C-n C-n C-n C-n <return> C-n C-n <return>
M-1 C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-p <return> <help-echo> <help-echo> <down-mouse-1>
<mouse-1> C-a C-SPC C-n C-n C-n C-n C-n C-n C-n C-SPC
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p M-w C-x o q g M-x r e p o r t - e m a c s - b u
g <return>
Recent messages:
nndiary: Reading incoming mail (no new mail)...done
1 -> require: nnmail
1 <- require: nnmail
1 -> require: nnmail
1 <- require: nnmail
Checking new news...done
Loading emacsbug...
1 -> require: sendmail
1 <- require: sendmail
Loading emacsbug...done
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: `match-data' set improperly
2003-02-12 3:16 `match-data' set improperly Matthew Swift
@ 2003-02-12 10:53 ` Kim F. Storm
2003-02-12 19:25 ` Matt Swift
[not found] ` <mailman.1845.1045043916.21513.bug-gnu-emacs@gnu.org>
1 sibling, 1 reply; 8+ messages in thread
From: Kim F. Storm @ 2003-02-12 10:53 UTC (permalink / raw)
Cc: bug-gnu-emacs
Matthew Swift <swift@alum.mit.edu> writes:
> Evaluating the following `let' form gives varying results, but the results
> should be consistent. Correct results are always returned when `s' is "The
> quick fox...", which matches the regexp `r'. When `s' is "great grey
> green...", which does not match `r', the results in my trials have never been
> correct: either an 'args-out-of-range' error is raised, or incorrect data is
> given, which appears to be the result of applying match-data from matching a
> previous value of `s' ("The quick fox...") to the current value of `s'. Sample
> results are copied below.
>
> My *guess* is that when `string-match' fails, it either fails to set
> `match-data' or sets it incorrectly in one of two (or more) ways.
My guess is that you are not supposed to use match-data when
string-match fails, so its value is indeed undefined in that case.
What _valid_ values would you expect it to contain after the match failed?
--
Kim F. Storm http://www.cua.dk
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: `match-data' set improperly
2003-02-12 10:53 ` Kim F. Storm
@ 2003-02-12 19:25 ` Matt Swift
0 siblings, 0 replies; 8+ messages in thread
From: Matt Swift @ 2003-02-12 19:25 UTC (permalink / raw)
Cc: bug-gnu-emacs
The documentation (TeXinfo and docstrings) is pretty clear that
(match-string N) when submatch N>0 does not match anything is nil, and it is
also clear that (match-string 0) is like the case for N>0 but referring to
the entire match instead of a submatch. There is no suggestion that
match-data is ever undefined, once one match has been done, and the error
message you get is obscure.
Certainly the problem could be simply in documentation. But the valid value
of (match-string 0) after a failure *ought* to be nil, just as it is for,
say, (match-string 4) when there is no submatch 4 or submatch 4 matched
nothing. If I had to check what `string-match' and friends returned every
time before accessing the match-data, it would be a pain in the neck and I
might hallucinate that I was programming in C not Emacs-Lisp.
i.e., ideally,
(string-match "a" "b") => nil
(match-data 'integers) => (nil nil)
(string-match "ab\\(cd\\)?\\(ef\\)?" "abef") => 0
(match-data 'integers) => (0 4 nil nil 2 4)
(string-match "a" "b") => nil
(match-data 'integers) => (nil nil)
I do not see much use for refraining to change the match data when a match
fails (if that is what is happening, and I'm not sure it is), and programs
that want to do that can implement it easily with `save-match-data', e.g.,
depending on whether one wants to be as careful as save-match-data to
restore match data if there is an error when doing the `string-match',
(defun string-match-or-I-never-existed (r s &optional start)
(let ((md (match-data))
(result (string-match r s start)))
(or result (set-match-data md))
;; NOTE above relies on undocumented but observed fact that `set-match-data'
;; always returns nil.
-----Original Message-----
From: Kim F. Storm [mailto:no-spam@cua.dk]
Sent: Wednesday, February 12, 2003 5:53 AM
To: Matthew Swift
Cc: bug-gnu-emacs@gnu.org
Subject: Re: `match-data' set improperly
Matthew Swift <swift@alum.mit.edu> writes:
> Evaluating the following `let' form gives varying results, but the results
> should be consistent.
> My *guess* is that when `string-match' fails, it either fails to set
> `match-data' or sets it incorrectly in one of two (or more) ways.
My guess is that you are not supposed to use match-data when
string-match fails, so its value is indeed undefined in that case.
What _valid_ values would you expect it to contain after the match failed?
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <mailman.1845.1045043916.21513.bug-gnu-emacs@gnu.org>]
* Re: `match-data' set improperly
[not found] ` <mailman.1845.1045043916.21513.bug-gnu-emacs@gnu.org>
@ 2003-02-12 16:02 ` Kevin Rodgers
2003-02-13 10:08 ` Richard Stallman
0 siblings, 1 reply; 8+ messages in thread
From: Kevin Rodgers @ 2003-02-12 16:02 UTC (permalink / raw)
Kim F. Storm wrote:
> My guess is that you are not supposed to use match-data when
> string-match fails, so its value is indeed undefined in that case.
>
> What _valid_ values would you expect it to contain after the match failed?
() is a valid list, which is what match-data must return.
I think match-beginning, match-end, and string-match should signal an error
if match-data is nil.
--
<a href="mailto:<kevin.rodgers@ihs.com>">Kevin Rodgers</a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: `match-data' set improperly
2003-02-12 16:02 ` Kevin Rodgers
@ 2003-02-13 10:08 ` Richard Stallman
0 siblings, 0 replies; 8+ messages in thread
From: Richard Stallman @ 2003-02-13 10:08 UTC (permalink / raw)
Cc: gnu-emacs-bug
I think match-beginning, match-end, and string-match should signal an error
if match-data is nil.
This is the sort of thing that is really more risky than it looks like.
You can convince yourself that nothing will break, and if it is, it had
a bug anyway. But the end result is that it makes for more work.
This might be a change for the better in the long term, despite that;
but we have enough work now, so let's not make more.
Meanwhile, we should really focus on changes that make Emacs more powerful
for users, not on elegance of the programming system.
^ permalink raw reply [flat|nested] 8+ messages in thread
* `match-data' set improperly
@ 2003-02-13 0:54 Luc Teirlinck
2003-02-13 6:42 ` Matt Swift
0 siblings, 1 reply; 8+ messages in thread
From: Luc Teirlinck @ 2003-02-13 0:54 UTC (permalink / raw)
Cc: 'Kim F. Storm'
Matthew Swift wrote:
The documentation (TeXinfo and docstrings) is pretty clear that
(match-string N) when submatch N>0 does not match anything is nil,
and = it is also clear that (match-string 0) is like the case for
N>0 but referring = to the entire match instead of a submatch.
There is no suggestion that match-data is ever undefined, once one
match has been done,
The following in the Elisp manual (Chapter: Searching and Matching, Simple
Match Data Access) suggests exactly that to me:
A search which fails may or may not alter the match data. In the
past, a failing search did not do this, but we may change it in the
future.
Sincerely,
Luc.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: `match-data' set improperly
2003-02-13 0:54 Luc Teirlinck
@ 2003-02-13 6:42 ` Matt Swift
2003-02-13 12:25 ` Kim F. Storm
0 siblings, 1 reply; 8+ messages in thread
From: Matt Swift @ 2003-02-13 6:42 UTC (permalink / raw)
Cc: 'Kim F. Storm'
I stand corrected.
Nevertheless I see a strong case for revising both the documentation and the
behavior of the function. Searches fail frequently and are often expected
to fail, and though I acknowledge the relevant information was available, I
will presume to say that reasonably competent, conscientious, and
experienced people can overlook it. This essential information should be in
the docstring and the TeXinfo documentation of `string-match' etc, not
merely in the TeXinfo discussion of these functions.
The second sentence you quote is almost meaningless. "did not do this" --
did not do WHAT? alter it, or not alter it? Likewise, one cannot tell what
might be changed to what in the future.
As I wrote earlier today, I find good reasons to want failed searches to set
the match data in a manner consistent with failed submatches, and I see no
good reason to refrain from doing so. I doubt the efficiency advantage to
declining to set the match-data to (nil nil) is even measurable, and as a
matter of principle, well-defined data can not be less useful than undefined
data.
-----Original Message-----
From: Luc Teirlinck [mailto:teirllm@dms.auburn.edu]
Sent: Wednesday, February 12, 2003 7:55 PM
To: Matthew Swift
Cc: bug-gnu-emacs@gnu.org; 'Kim F. Storm'
Subject: `match-data' set improperly
Matthew Swift wrote:
The documentation (TeXinfo and docstrings) is pretty clear that
(match-string N) when submatch N>0 does not match anything is nil,
and = it is also clear that (match-string 0) is like the case for
N>0 but referring = to the entire match instead of a submatch.
There is no suggestion that match-data is ever undefined, once one
match has been done,
The following in the Elisp manual (Chapter: Searching and Matching, Simple
Match Data Access) suggests exactly that to me:
A search which fails may or may not alter the match data. In the
past, a failing search did not do this, but we may change it in the
future.
Sincerely,
Luc.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: `match-data' set improperly
2003-02-13 6:42 ` Matt Swift
@ 2003-02-13 12:25 ` Kim F. Storm
0 siblings, 0 replies; 8+ messages in thread
From: Kim F. Storm @ 2003-02-13 12:25 UTC (permalink / raw)
Cc: bug-gnu-emacs
"Matt Swift" <swift@alum.mit.edu> writes:
> I stand corrected.
>
> Nevertheless I see a strong case for revising both the documentation and the
> behavior of the function.
I have fixed the documentation for match-data, so that it explicitly
states that the return value is undefined if the last search failed.
I don't see a strong reason to change the behaviour; it's worked like this
for many years, so people seem to be able to cope with it.
I think the code is generally easier to understand if there is an
explicit test on the return value of the search command.
On the other hand, I don't object (but I'm not the one to decide!!) to
changing this behaviour in the way you suggest (i.e. make the various
match-* functions return nil if the last search failed).
Maybe you can write a patch (it can be isolated to search.c I think),
so we can see what's needed?
> The second sentence you quote is almost meaningless. "did not do
> this" -- did not do WHAT? alter it, or not alter it? Likewise, one
> cannot tell what might be changed to what in the future.
At least it cleary (:-)) indicates that you should not rely on the
return value, neither now, nor in the future.
--
Kim F. Storm http://www.cua.dk
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-02-13 12:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-12 3:16 `match-data' set improperly Matthew Swift
2003-02-12 10:53 ` Kim F. Storm
2003-02-12 19:25 ` Matt Swift
[not found] ` <mailman.1845.1045043916.21513.bug-gnu-emacs@gnu.org>
2003-02-12 16:02 ` Kevin Rodgers
2003-02-13 10:08 ` Richard Stallman
-- strict thread matches above, loose matches on Subject: below --
2003-02-13 0:54 Luc Teirlinck
2003-02-13 6:42 ` Matt Swift
2003-02-13 12:25 ` Kim F. Storm
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).