From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Discrepancy in definition/use of match-data? Date: 11 Jun 2004 01:56:39 +0200 Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1086911820 29845 80.91.224.253 (10 Jun 2004 23:57:00 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 10 Jun 2004 23:57:00 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Jun 11 01:56:54 2004 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BYZPh-0004Gr-00 for ; Fri, 11 Jun 2004 01:56:53 +0200 Original-Received: from lists.gnu.org ([199.232.76.165]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1BYZPh-00035M-00 for ; Fri, 11 Jun 2004 01:56:53 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BYZQP-0007ur-RP for emacs-devel@quimby.gnus.org; Thu, 10 Jun 2004 19:57:37 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BYZQM-0007ug-Vd for emacs-devel@gnu.org; Thu, 10 Jun 2004 19:57:35 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BYZQM-0007uH-0A for emacs-devel@gnu.org; Thu, 10 Jun 2004 19:57:34 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BYZQL-0007uE-TD for emacs-devel@gnu.org; Thu, 10 Jun 2004 19:57:33 -0400 Original-Received: from [199.232.76.164] (helo=fencepost.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1BYZPc-0004X8-0E for emacs-devel@gnu.org; Thu, 10 Jun 2004 19:56:48 -0400 Original-Received: from localhost ([127.0.0.1] helo=lola.goethe.zz) by fencepost.gnu.org with esmtp (Exim 4.34) id 1BYZPU-0003Ph-Ty; Thu, 10 Jun 2004 19:56:41 -0400 Original-To: rms@gnu.org In-Reply-To: Original-Lines: 95 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:24812 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:24812 Richard Stallman writes: > I think the fix would be to have match-limit return Qnil instead of > flagging an error for the condition > > n >= search_regs.num_regs > > Correct? > > I agree with you. I have thought a bit about the context in which I have seen this error condition flagged in the last few days. The first example was inside of split-string (the 21.4.13 XEmacs version) when running inside of a process filter: the routine was buggy since it could call (match-beginning 0) even when there were no preceeding _successful_ matches. However, since the current API will not change the match data after an unsuccessful match, in most uses this bug was masked by some old existing data in (match-beginning 0). So only the filter routine (which starts out with a fresh copy of void match-data) provided a sufficient context for flagging the bug. In particular, such a bug is very evasive in debugging, since the situation of pristinely void match-data will then typically be absent. In consequence, this bug persisted for half a year _after_ it was observed. If I change the test as indicated above, the bug would not even get flagged anymore, not even in a process filter, and would go completely undetected. Not good. Now the obvious solution to that would be to make unsuccessful matches void the match-data, too. I myself have no recollection of any discussions about this, but Stephen Turnbull was pretty vocal about having proposed something like this, but that apparently the idiom (while (search-forward "Something")) do something with (match-beginning 0) was given as a reason why an unsuccessful match should not clear the match-data. But actually, the documentation of the match-data function claims: Return value is undefined if the last search failed. Invalidating the match-data after unsuccessful matches would quite increase the probability of catching such bugs, while being consistent with the current documentation of match-data. However, if this suggestion has been turned down with the above example, then this "idiom" needs to get searched and replaced. It is my personal opinion that this idiom is not worth the loss of bug detection, and it is inconsistent with the current match-data documentation, anyway. But it would probably be a bad idea to change this right now in the feature freeze, possibly breaking things relying on that quite undocumented behavior. So my proposal is the following plan: Before next release: match-data gets voided upon entry of a filter or sentinel, like it is being done now. With void match-data, match-beginning and so on flag an error irrespective of their argument. The match-data is only touched by a successful match. Once a match has been successful, match-beginning and so on will not flag errors for positive arguments, but return nil (as documented). In order to have a better chance of catching such use-before-valid-match situations, it might be a good idea also to void the match-data in the main loop. After the next release, I would like to have unsuccessful matches also void the match-data, making the use of the above quoted idiom illegal. It is consistent with the current documentation of the functions, and it gives much better possibilities of actually catching bugs that at the moment only trigger errors in obscure, badly debuggable places like output filters. We should do this change right after the next release so that Emacs core developers and those for add-on packages have enough time testing it, finding the problematic code and fixing it for the next release after that. What I'll do right now is just changing the condition that it will not flag an error for greater-than-encountered match-data indices, except for the case of completely void match-data where I'll still flag an error. So my current change will flag fewer errors than before (and in particular leave out false alarms like in locate.el). This change does not merit discussion before application: it only fixes things without breaking existing idioms. My other proposals (such as voiding the match-data in the main loop) would warrant other opinions, however. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum