From: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Discrepancy in definition/use of match-data?
Date: 12 Jun 2004 11:03:53 +0200 [thread overview]
Message-ID: <x54qphxi4m.fsf@lola.goethe.zz> (raw)
In-Reply-To: <87vfhx46lc.fsf@tleepslib.sk.tsukuba.ac.jp>
"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> >>>>> "David" == David Kastrup <dak@gnu.org> writes:
>
> David> How strong were the results from the regression test?
>
> What does that mean? There is one test that specifically checks for
> match-data being preserved across a failed match, and it failed.
Ah, ok. I thought that the regression test might have involved
calling a lot of complex functions to see whether they still worked.
It dod not occur to me that it checked for the particular semantics
explicitly.
> David> What kind and amount of code appears to be affected?
>
> Kind? I know of one specific use in `w3-configuration-data' in
> w3-cfg.el which calls itself recursively, and depends on being able
> to use the top-level match-data if the match in the recursive call
> fails, while using the match-data from the recursive call otherwise.
> Too sneaky to live, I suppose you could say.
Sneakiness is all fine. It is just the question whether it is worth
the price. It would be a good idea to check this price out after the
next release, as said.
> Amount? How should I know? I can say I've tried inserting warning
> code in match_limit and file-name-sans-extension produces the
> warning during the dump phase. I don't know why yet, my
> implementation may be incorrect (the flag is either getting reset
> when it shouldn't, or fails to get set on a successful search---the
> Boyer-Moore code is the most complicated thing I've ever tried to
> deal with).
>
> However, I know that file-handlers can get called there, via
> file-name-directory inter alia. In general, even a simple variable
> reference can call arbitrary Lisp code in XEmacs (because of magic
> Mule handlers) and most likely GNU Emacs (because those handlers
> were introduced for GNU Emacs compatibility).
In which case this particular sort of references better use
save-match-data.
> David> With the current code. But I was also proposing to void
> David> the match-data in the main loop: that would produce
> David> errors more often, and only in situations where the
> David> match-data was completely unpredictable to start with
> David> (and possibly undefined, too).
>
> But that's not true. The match-data was sufficiently predictable
> that split-string only failed when it was voided.
The code in question was the following:
(let (parts (start 0) (len (length string)))
(if (string-match pattern string)
(setq parts (cons (substring string 0 (match-beginning 0)) parts)
start (match-end 0)))
The error occured when the condition was false to start with and setq
not reached. Then we had the following:
(while (and (< start len)
This condition is true for non-empty string
(string-match pattern string (if (> start (match-beginning 0))
start
(1+ start))))
start is still 0, match-beginning has a random value (this is where
the error got flagged). Without an error, we'll get into the false
branch, so the match starts at 1 now. Since the precondition was only
true for non-empty string, starting the match at 1 is valid, will
again fail like the first match, and the loop body gets skipped.
So this time we are lucky: the random contents of match-beginning
would indeed not matter. Would you want to rely on that?
> I think it's likely that that mistake has been made elsewhere.
I doubt that we can rely on its consequences being benign.
> On the other hand, pretty much any time any Lisp code intervenes
> between the return of the matching function and entry to the match
> data access there is an opportunity for a hook or handler to be
> called.
That's what save-match-data is for. Hooks or handlers that intervene
at random places need to use it when they might likely call regexp
code. This is nothing new.
> Probability of hitting it on any given path is low, but potential
> for random annoyance for years to come is high, I fear.
It is obvious that you are not fond of the interface, presumably
preferring a complex return value for matching functions including the
match data. But it is not like it would be feasible to change it,
anyway.
The only addition I can think of is to offer an explicit function
void-match-data that users would be free to call at points when they
know the match-data no longer to be needed. Having such a function
explicitly available might help for debugging use-before-definition
cases like the above.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
next prev parent reply other threads:[~2004-06-12 9:03 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup
2004-06-10 23:01 ` Richard Stallman
2004-06-10 23:56 ` David Kastrup
2004-06-11 8:34 ` Stephen J. Turnbull
2004-06-11 8:54 ` David Kastrup
2004-06-12 6:45 ` Stephen J. Turnbull
2004-06-12 9:03 ` David Kastrup [this message]
2004-06-13 0:01 ` Richard Stallman
2004-06-14 5:06 ` Stephen J. Turnbull
2004-06-14 9:05 ` David Kastrup
2004-06-14 10:05 ` Stephen J. Turnbull
2004-06-16 7:13 ` Stephen J. Turnbull
2004-06-19 3:19 ` Richard Stallman
2004-06-23 9:53 ` Stephen J. Turnbull
2004-06-19 3:19 ` Richard Stallman
2004-06-12 1:51 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x54qphxi4m.fsf@lola.goethe.zz \
--to=dak@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.