unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Discrepancy in definition/use of match-data?
Date: 12 Jun 2004 11:03:53 +0200	[thread overview]
Message-ID: <x54qphxi4m.fsf@lola.goethe.zz> (raw)
In-Reply-To: <87vfhx46lc.fsf@tleepslib.sk.tsukuba.ac.jp>

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> >>>>> "David" == David Kastrup <dak@gnu.org> writes:
> 
>     David> How strong were the results from the regression test?
> 
> What does that mean?  There is one test that specifically checks for
> match-data being preserved across a failed match, and it failed.

Ah, ok.  I thought that the regression test might have involved
calling a lot of complex functions to see whether they still worked.
It dod not occur to me that it checked for the particular semantics
explicitly.

>     David> What kind and amount of code appears to be affected?
> 
> Kind?  I know of one specific use in `w3-configuration-data' in
> w3-cfg.el which calls itself recursively, and depends on being able
> to use the top-level match-data if the match in the recursive call
> fails, while using the match-data from the recursive call otherwise.
> Too sneaky to live, I suppose you could say.

Sneakiness is all fine.  It is just the question whether it is worth
the price.  It would be a good idea to check this price out after the
next release, as said.

> Amount?  How should I know?  I can say I've tried inserting warning
> code in match_limit and file-name-sans-extension produces the
> warning during the dump phase.  I don't know why yet, my
> implementation may be incorrect (the flag is either getting reset
> when it shouldn't, or fails to get set on a successful search---the
> Boyer-Moore code is the most complicated thing I've ever tried to
> deal with).
> 
> However, I know that file-handlers can get called there, via
> file-name-directory inter alia.  In general, even a simple variable
> reference can call arbitrary Lisp code in XEmacs (because of magic
> Mule handlers) and most likely GNU Emacs (because those handlers
> were introduced for GNU Emacs compatibility).

In which case this particular sort of references better use
save-match-data.

>     David> With the current code.  But I was also proposing to void
>     David> the match-data in the main loop: that would produce
>     David> errors more often, and only in situations where the
>     David> match-data was completely unpredictable to start with
>     David> (and possibly undefined, too).
> 
> But that's not true.  The match-data was sufficiently predictable
> that split-string only failed when it was voided.

The code in question was the following:

  (let (parts (start 0) (len (length string)))
    (if (string-match pattern string)
	(setq parts (cons (substring string 0 (match-beginning 0)) parts)
	      start (match-end 0)))

The error occured when the condition was false to start with and setq
not reached.  Then we had the following:

    (while (and (< start len)

This condition is true for non-empty string

		(string-match pattern string (if (> start (match-beginning 0))
						 start
					       (1+ start))))

start is still 0, match-beginning has a random value (this is where
the error got flagged).  Without an error, we'll get into the false
branch, so the match starts at 1 now.  Since the precondition was only
true for non-empty string, starting the match at 1 is valid, will
again fail like the first match, and the loop body gets skipped.

So this time we are lucky: the random contents of match-beginning
would indeed not matter.  Would you want to rely on that?

> I think it's likely that that mistake has been made elsewhere.

I doubt that we can rely on its consequences being benign.

> On the other hand, pretty much any time any Lisp code intervenes
> between the return of the matching function and entry to the match
> data access there is an opportunity for a hook or handler to be
> called.

That's what save-match-data is for.  Hooks or handlers that intervene
at random places need to use it when they might likely call regexp
code.  This is nothing new.

> Probability of hitting it on any given path is low, but potential
> for random annoyance for years to come is high, I fear.

It is obvious that you are not fond of the interface, presumably
preferring a complex return value for matching functions including the
match data.  But it is not like it would be feasible to change it,
anyway.

The only addition I can think of is to offer an explicit function
void-match-data that users would be free to call at points when they
know the match-data no longer to be needed.  Having such a function
explicitly available might help for debugging use-before-definition
cases like the above.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

  reply	other threads:[~2004-06-12  9:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup
2004-06-10 23:01 ` Richard Stallman
2004-06-10 23:56   ` David Kastrup
2004-06-11  8:34     ` Stephen J. Turnbull
2004-06-11  8:54       ` David Kastrup
2004-06-12  6:45         ` Stephen J. Turnbull
2004-06-12  9:03           ` David Kastrup [this message]
2004-06-13  0:01           ` Richard Stallman
2004-06-14  5:06             ` Stephen J. Turnbull
2004-06-14  9:05               ` David Kastrup
2004-06-14 10:05                 ` Stephen J. Turnbull
2004-06-16  7:13               ` Stephen J. Turnbull
2004-06-19  3:19                 ` Richard Stallman
2004-06-23  9:53                   ` Stephen J. Turnbull
2004-06-19  3:19                 ` Richard Stallman
2004-06-12  1:51     ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x54qphxi4m.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).