From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Discrepancy in definition/use of match-data? Date: 12 Jun 2004 11:03:53 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: <87n03a7as9.fsf@tleepslib.sk.tsukuba.ac.jp> <87vfhx46lc.fsf@tleepslib.sk.tsukuba.ac.jp> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1087031065 1871 80.91.224.253 (12 Jun 2004 09:04:25 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 12 Jun 2004 09:04:25 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Sat Jun 12 11:04:19 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BZ4R1-0004W7-00 for ; Sat, 12 Jun 2004 11:04:19 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BZ4R1-0004JA-00 for ; Sat, 12 Jun 2004 11:04:19 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BZ4Rn-0002pW-Nj for emacs-devel@quimby.gnus.org; Sat, 12 Jun 2004 05:05:07 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BZ4Rd-0002ow-RX for emacs-devel@gnu.org; Sat, 12 Jun 2004 05:04:58 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BZ4Rd-0002oZ-37 for emacs-devel@gnu.org; Sat, 12 Jun 2004 05:04:57 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BZ4Rc-0002oP-Qz for emacs-devel@gnu.org; Sat, 12 Jun 2004 05:04:56 -0400 Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BZ4Qd-0008Mo-PI for emacs-devel@gnu.org; Sat, 12 Jun 2004 05:03:55 -0400 Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz) by fencepost.gnu.org with esmtp (Exim 4.34) id 1BZ4Qc-00066L-Mv; Sat, 12 Jun 2004 05:03:55 -0400 Original-To: "Stephen J. Turnbull" In-Reply-To: <87vfhx46lc.fsf@tleepslib.sk.tsukuba.ac.jp> Original-Lines: 108 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24865 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24865 "Stephen J. Turnbull" writes: > >>>>> "David" == David Kastrup writes: > > David> How strong were the results from the regression test? > > What does that mean? There is one test that specifically checks for > match-data being preserved across a failed match, and it failed. Ah, ok. I thought that the regression test might have involved calling a lot of complex functions to see whether they still worked. It dod not occur to me that it checked for the particular semantics explicitly. > David> What kind and amount of code appears to be affected? > > Kind? I know of one specific use in `w3-configuration-data' in > w3-cfg.el which calls itself recursively, and depends on being able > to use the top-level match-data if the match in the recursive call > fails, while using the match-data from the recursive call otherwise. > Too sneaky to live, I suppose you could say. Sneakiness is all fine. It is just the question whether it is worth the price. It would be a good idea to check this price out after the next release, as said. > Amount? How should I know? I can say I've tried inserting warning > code in match_limit and file-name-sans-extension produces the > warning during the dump phase. I don't know why yet, my > implementation may be incorrect (the flag is either getting reset > when it shouldn't, or fails to get set on a successful search---the > Boyer-Moore code is the most complicated thing I've ever tried to > deal with). > > However, I know that file-handlers can get called there, via > file-name-directory inter alia. In general, even a simple variable > reference can call arbitrary Lisp code in XEmacs (because of magic > Mule handlers) and most likely GNU Emacs (because those handlers > were introduced for GNU Emacs compatibility). In which case this particular sort of references better use save-match-data. > David> With the current code. But I was also proposing to void > David> the match-data in the main loop: that would produce > David> errors more often, and only in situations where the > David> match-data was completely unpredictable to start with > David> (and possibly undefined, too). > > But that's not true. The match-data was sufficiently predictable > that split-string only failed when it was voided. The code in question was the following: (let (parts (start 0) (len (length string))) (if (string-match pattern string) (setq parts (cons (substring string 0 (match-beginning 0)) parts) start (match-end 0))) The error occured when the condition was false to start with and setq not reached. Then we had the following: (while (and (< start len) This condition is true for non-empty string (string-match pattern string (if (> start (match-beginning 0)) start (1+ start)))) start is still 0, match-beginning has a random value (this is where the error got flagged). Without an error, we'll get into the false branch, so the match starts at 1 now. Since the precondition was only true for non-empty string, starting the match at 1 is valid, will again fail like the first match, and the loop body gets skipped. So this time we are lucky: the random contents of match-beginning would indeed not matter. Would you want to rely on that? > I think it's likely that that mistake has been made elsewhere. I doubt that we can rely on its consequences being benign. > On the other hand, pretty much any time any Lisp code intervenes > between the return of the matching function and entry to the match > data access there is an opportunity for a hook or handler to be > called. That's what save-match-data is for. Hooks or handlers that intervene at random places need to use it when they might likely call regexp code. This is nothing new. > Probability of hitting it on any given path is low, but potential > for random annoyance for years to come is high, I fear. It is obvious that you are not fond of the interface, presumably preferring a complex return value for matching functions including the match data. But it is not like it would be feasible to change it, anyway. The only addition I can think of is to offer an explicit function void-match-data that users would be free to call at points when they know the match-data no longer to be needed. Having such a function explicitly available might help for debugging use-before-definition cases like the above. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum