* Discrepancy in definition/use of match-data? @ 2004-06-09 15:37 David Kastrup 2004-06-10 23:01 ` Richard Stallman 0 siblings, 1 reply; 16+ messages in thread From: David Kastrup @ 2004-06-09 15:37 UTC (permalink / raw) We have the following excerpt here: static Lisp_Object match_limit (num, beginningp) Lisp_Object num; int beginningp; { register int n; CHECK_NUMBER (num); n = XINT (num); if (n < 0 || n >= search_regs.num_regs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ args_out_of_range (num, make_number (search_regs.num_regs)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ if (search_regs.num_regs <= 0 || search_regs.start[n] < 0) return Qnil; return (make_number ((beginningp) ? search_regs.start[n] : search_regs.end[n])); } DEFUN ("match-beginning", Fmatch_beginning, Smatch_beginning, 1, 1, 0, doc: /* Return position of start of text matched by last search. SUBEXP, a number, specifies which parenthesized expression in the last regexp. Value is nil if SUBEXPth pair didn't match, or there were less than ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SUBEXP pairs. ^^^^^^^^^^^^^ Same for match-end and probably match-string. But if there was no previous match with at least the same number of SUBEXPs in the history of the Emacs session, we will get an args_out_of_range error. I think the fix would be to have match-limit return Qnil instead of flagging an error for the condition n >= search_regs.num_regs Correct? -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup @ 2004-06-10 23:01 ` Richard Stallman 2004-06-10 23:56 ` David Kastrup 0 siblings, 1 reply; 16+ messages in thread From: Richard Stallman @ 2004-06-10 23:01 UTC (permalink / raw) Cc: emacs-devel I think the fix would be to have match-limit return Qnil instead of flagging an error for the condition n >= search_regs.num_regs Correct? I agree with you. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-10 23:01 ` Richard Stallman @ 2004-06-10 23:56 ` David Kastrup 2004-06-11 8:34 ` Stephen J. Turnbull 2004-06-12 1:51 ` Richard Stallman 0 siblings, 2 replies; 16+ messages in thread From: David Kastrup @ 2004-06-10 23:56 UTC (permalink / raw) Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > I think the fix would be to have match-limit return Qnil instead of > flagging an error for the condition > > n >= search_regs.num_regs > > Correct? > > I agree with you. I have thought a bit about the context in which I have seen this error condition flagged in the last few days. The first example was inside of split-string (the 21.4.13 XEmacs version) when running inside of a process filter: the routine was buggy since it could call (match-beginning 0) even when there were no preceeding _successful_ matches. However, since the current API will not change the match data after an unsuccessful match, in most uses this bug was masked by some old existing data in (match-beginning 0). So only the filter routine (which starts out with a fresh copy of void match-data) provided a sufficient context for flagging the bug. In particular, such a bug is very evasive in debugging, since the situation of pristinely void match-data will then typically be absent. In consequence, this bug persisted for half a year _after_ it was observed. If I change the test as indicated above, the bug would not even get flagged anymore, not even in a process filter, and would go completely undetected. Not good. Now the obvious solution to that would be to make unsuccessful matches void the match-data, too. I myself have no recollection of any discussions about this, but Stephen Turnbull was pretty vocal about having proposed something like this, but that apparently the idiom (while (search-forward "Something")) do something with (match-beginning 0) was given as a reason why an unsuccessful match should not clear the match-data. But actually, the documentation of the match-data function claims: Return value is undefined if the last search failed. Invalidating the match-data after unsuccessful matches would quite increase the probability of catching such bugs, while being consistent with the current documentation of match-data. However, if this suggestion has been turned down with the above example, then this "idiom" needs to get searched and replaced. It is my personal opinion that this idiom is not worth the loss of bug detection, and it is inconsistent with the current match-data documentation, anyway. But it would probably be a bad idea to change this right now in the feature freeze, possibly breaking things relying on that quite undocumented behavior. So my proposal is the following plan: Before next release: match-data gets voided upon entry of a filter or sentinel, like it is being done now. With void match-data, match-beginning and so on flag an error irrespective of their argument. The match-data is only touched by a successful match. Once a match has been successful, match-beginning and so on will not flag errors for positive arguments, but return nil (as documented). In order to have a better chance of catching such use-before-valid-match situations, it might be a good idea also to void the match-data in the main loop. After the next release, I would like to have unsuccessful matches also void the match-data, making the use of the above quoted idiom illegal. It is consistent with the current documentation of the functions, and it gives much better possibilities of actually catching bugs that at the moment only trigger errors in obscure, badly debuggable places like output filters. We should do this change right after the next release so that Emacs core developers and those for add-on packages have enough time testing it, finding the problematic code and fixing it for the next release after that. What I'll do right now is just changing the condition that it will not flag an error for greater-than-encountered match-data indices, except for the case of completely void match-data where I'll still flag an error. So my current change will flag fewer errors than before (and in particular leave out false alarms like in locate.el). This change does not merit discussion before application: it only fixes things without breaking existing idioms. My other proposals (such as voiding the match-data in the main loop) would warrant other opinions, however. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-10 23:56 ` David Kastrup @ 2004-06-11 8:34 ` Stephen J. Turnbull 2004-06-11 8:54 ` David Kastrup 2004-06-12 1:51 ` Richard Stallman 1 sibling, 1 reply; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-11 8:34 UTC (permalink / raw) Cc: emacs-devel >>>>> "David" == David Kastrup <dak@gnu.org> writes: David> Now the obvious solution to that would be to make David> unsuccessful matches void the match-data, too. I myself David> have no recollection of any discussions about this, but David> Stephen Turnbull was pretty vocal about having proposed David> something like this, To be precise, I tried it in a workspace and was immediately slapped down by a regression test failure. David> What I'll do right now is just changing the condition that David> it will not flag an error for greater-than-encountered David> match-data indices, except for the case of completely void David> match-data where I'll still flag an error. I would suggest improving the error message if at all possible, and documenting this prominently, as it is likely to happen very rarely, and then only in asynchronous calls. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-11 8:34 ` Stephen J. Turnbull @ 2004-06-11 8:54 ` David Kastrup 2004-06-12 6:45 ` Stephen J. Turnbull 0 siblings, 1 reply; 16+ messages in thread From: David Kastrup @ 2004-06-11 8:54 UTC (permalink / raw) Cc: emacs-devel "Stephen J. Turnbull" <stephen@xemacs.org> writes: > >>>>> "David" == David Kastrup <dak@gnu.org> writes: > > David> Now the obvious solution to that would be to make > David> unsuccessful matches void the match-data, too. I myself > David> have no recollection of any discussions about this, but > David> Stephen Turnbull was pretty vocal about having proposed > David> something like this, > > To be precise, I tried it in a workspace and was immediately slapped > down by a regression test failure. That in itself does not constitute a decision. It suggests that such a change should probably not be made lightly, and not shortly before a release. But it does not preclude a long-term strategy of change if one can agree on the desirability: one would start by actively declaring this usage deprecated (if it ever was supposed to be allowed in the first place). At what time one actually clamps down the code is unrelated to the question of whether it is desirable to do so. How strong were the results from the regression test? What kind and amount of code appears to be affected? > David> What I'll do right now is just changing the condition > David> that it will not flag an error for > David> greater-than-encountered match-data indices, except for > David> the case of completely void match-data where I'll still > David> flag an error. > > I would suggest improving the error message if at all possible, and > documenting this prominently, as it is likely to happen very rarely, > and then only in asynchronous calls. With the current code. But I was also proposing to void the match-data in the main loop: that would produce errors more often, and only in situations where the match-data was completely unpredictable to start with (and possibly undefined, too). I have no clue about the involved complexity: if this leads to a danger of, say, recursive-edit or debug not working like before, one should postpone till after release. Anyway, I do not know enough about the error handling in C to propose better error handling or messages. I do agree that the currently generated message is dissatisfactorily obtuse. In particular, since the context flagged in the traceback is the caller, and not the particular function itself, so it is guesswork to figure out just _what_ function triggered the "out of range" error. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-11 8:54 ` David Kastrup @ 2004-06-12 6:45 ` Stephen J. Turnbull 2004-06-12 9:03 ` David Kastrup 2004-06-13 0:01 ` Richard Stallman 0 siblings, 2 replies; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-12 6:45 UTC (permalink / raw) Cc: emacs-devel >>>>> "David" == David Kastrup <dak@gnu.org> writes: David> How strong were the results from the regression test? What does that mean? There is one test that specifically checks for match-data being preserved across a failed match, and it failed. David> What kind and amount of code appears to be affected? Kind? I know of one specific use in `w3-configuration-data' in w3-cfg.el which calls itself recursively, and depends on being able to use the top-level match-data if the match in the recursive call fails, while using the match-data from the recursive call otherwise. Too sneaky to live, I suppose you could say. Amount? How should I know? I can say I've tried inserting warning code in match_limit and file-name-sans-extension produces the warning during the dump phase. I don't know why yet, my implementation may be incorrect (the flag is either getting reset when it shouldn't, or fails to get set on a successful search---the Boyer-Moore code is the most complicated thing I've ever tried to deal with). However, I know that file-handlers can get called there, via file-name-directory inter alia. In general, even a simple variable reference can call arbitrary Lisp code in XEmacs (because of magic Mule handlers) and most likely GNU Emacs (because those handlers were introduced for GNU Emacs compatibility). David> With the current code. But I was also proposing to void David> the match-data in the main loop: that would produce errors David> more often, and only in situations where the match-data was David> completely unpredictable to start with (and possibly David> undefined, too). But that's not true. The match-data was sufficiently predictable that split-string only failed when it was voided. I think it's likely that that mistake has been made elsewhere. On the other hand, pretty much any time any Lisp code intervenes between the return of the matching function and entry to the match data access there is an opportunity for a hook or handler to be called. Probability of hitting it on any given path is low, but potential for random annoyance for years to come is high, I fear. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-12 6:45 ` Stephen J. Turnbull @ 2004-06-12 9:03 ` David Kastrup 2004-06-13 0:01 ` Richard Stallman 1 sibling, 0 replies; 16+ messages in thread From: David Kastrup @ 2004-06-12 9:03 UTC (permalink / raw) Cc: emacs-devel "Stephen J. Turnbull" <stephen@xemacs.org> writes: > >>>>> "David" == David Kastrup <dak@gnu.org> writes: > > David> How strong were the results from the regression test? > > What does that mean? There is one test that specifically checks for > match-data being preserved across a failed match, and it failed. Ah, ok. I thought that the regression test might have involved calling a lot of complex functions to see whether they still worked. It dod not occur to me that it checked for the particular semantics explicitly. > David> What kind and amount of code appears to be affected? > > Kind? I know of one specific use in `w3-configuration-data' in > w3-cfg.el which calls itself recursively, and depends on being able > to use the top-level match-data if the match in the recursive call > fails, while using the match-data from the recursive call otherwise. > Too sneaky to live, I suppose you could say. Sneakiness is all fine. It is just the question whether it is worth the price. It would be a good idea to check this price out after the next release, as said. > Amount? How should I know? I can say I've tried inserting warning > code in match_limit and file-name-sans-extension produces the > warning during the dump phase. I don't know why yet, my > implementation may be incorrect (the flag is either getting reset > when it shouldn't, or fails to get set on a successful search---the > Boyer-Moore code is the most complicated thing I've ever tried to > deal with). > > However, I know that file-handlers can get called there, via > file-name-directory inter alia. In general, even a simple variable > reference can call arbitrary Lisp code in XEmacs (because of magic > Mule handlers) and most likely GNU Emacs (because those handlers > were introduced for GNU Emacs compatibility). In which case this particular sort of references better use save-match-data. > David> With the current code. But I was also proposing to void > David> the match-data in the main loop: that would produce > David> errors more often, and only in situations where the > David> match-data was completely unpredictable to start with > David> (and possibly undefined, too). > > But that's not true. The match-data was sufficiently predictable > that split-string only failed when it was voided. The code in question was the following: (let (parts (start 0) (len (length string))) (if (string-match pattern string) (setq parts (cons (substring string 0 (match-beginning 0)) parts) start (match-end 0))) The error occured when the condition was false to start with and setq not reached. Then we had the following: (while (and (< start len) This condition is true for non-empty string (string-match pattern string (if (> start (match-beginning 0)) start (1+ start)))) start is still 0, match-beginning has a random value (this is where the error got flagged). Without an error, we'll get into the false branch, so the match starts at 1 now. Since the precondition was only true for non-empty string, starting the match at 1 is valid, will again fail like the first match, and the loop body gets skipped. So this time we are lucky: the random contents of match-beginning would indeed not matter. Would you want to rely on that? > I think it's likely that that mistake has been made elsewhere. I doubt that we can rely on its consequences being benign. > On the other hand, pretty much any time any Lisp code intervenes > between the return of the matching function and entry to the match > data access there is an opportunity for a hook or handler to be > called. That's what save-match-data is for. Hooks or handlers that intervene at random places need to use it when they might likely call regexp code. This is nothing new. > Probability of hitting it on any given path is low, but potential > for random annoyance for years to come is high, I fear. It is obvious that you are not fond of the interface, presumably preferring a complex return value for matching functions including the match data. But it is not like it would be feasible to change it, anyway. The only addition I can think of is to offer an explicit function void-match-data that users would be free to call at points when they know the match-data no longer to be needed. Having such a function explicitly available might help for debugging use-before-definition cases like the above. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-12 6:45 ` Stephen J. Turnbull 2004-06-12 9:03 ` David Kastrup @ 2004-06-13 0:01 ` Richard Stallman 2004-06-14 5:06 ` Stephen J. Turnbull 1 sibling, 1 reply; 16+ messages in thread From: Richard Stallman @ 2004-06-13 0:01 UTC (permalink / raw) Cc: dak, emacs-devel In general, even a simple variable reference can call arbitrary Lisp code in XEmacs (because of magic Mule handlers) and most likely GNU Emacs (because those handlers were introduced for GNU Emacs compatibility). I have never heard of "magic Mule handlers", and I am pretty sure there is nothing like this in Emacs. If we had them, we would make them save and restore the match data. On the other hand, pretty much any time any Lisp code intervenes between the return of the matching function and entry to the match data access there is an opportunity for a hook or handler to be called. That is rather an exaggeration. Most of the Lisp functions described in the manual cannot call any hook. Asynchronous activities only happen inside functions that can wait, which means, only certain primitives. Likewise for file name handlers. And these normally save the match data anyway. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-13 0:01 ` Richard Stallman @ 2004-06-14 5:06 ` Stephen J. Turnbull 2004-06-14 9:05 ` David Kastrup 2004-06-16 7:13 ` Stephen J. Turnbull 0 siblings, 2 replies; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-14 5:06 UTC (permalink / raw) Cc: dak, emacs-devel >>>>> "rms" == Richard Stallman <rms@gnu.org> writes: rms> sjt writes: On the other hand, pretty much any time any Lisp code intervenes between the return of the matching function and entry to the match data access there is an opportunity for a hook or handler to be called. rms> That is rather an exaggeration. Exaggeration, sure. But it contains a kernel of truth. rms> Most of the Lisp functions described in the manual cannot rms> call any hook. True, but not relevant unless you know which ones they are. Offhand, I don't, and the docstrings/source comments are less than 100% reliable. Anyway, I've implemented a last-match-succeeded flag (for debug usage only) and check it in match_limits and Fmatch_data (when XEmacs is configured for error-checking). If we catch anything that looks relevant to GNU Emacs I'll report it here. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-14 5:06 ` Stephen J. Turnbull @ 2004-06-14 9:05 ` David Kastrup 2004-06-14 10:05 ` Stephen J. Turnbull 2004-06-16 7:13 ` Stephen J. Turnbull 1 sibling, 1 reply; 16+ messages in thread From: David Kastrup @ 2004-06-14 9:05 UTC (permalink / raw) Cc: rms, emacs-devel "Stephen J. Turnbull" <stephen@xemacs.org> writes: > >>>>> "rms" == Richard Stallman <rms@gnu.org> writes: > > rms> sjt writes: > > On the other hand, pretty much any time any Lisp code intervenes > between the return of the matching function and entry to the match > data access there is an opportunity for a hook or handler to be > called. > > rms> That is rather an exaggeration. > > Exaggeration, sure. But it contains a kernel of truth. Those things that run in hooks/handlers/whatever need to get wrapped in save-match-data, anyway. > rms> Most of the Lisp functions described in the manual cannot > rms> call any hook. > > True, but not relevant unless you know which ones they are. > Offhand, I don't, and the docstrings/source comments are less than > 100% reliable. Where the docstrings are not reliable, they need to get amended, obviously. > Anyway, I've implemented a last-match-succeeded flag (for debug > usage only) and check it in match_limits and Fmatch_data (when > XEmacs is configured for error-checking). If we catch anything that > looks relevant to GNU Emacs I'll report it here. Anyway, the args-out-of-range error that now gets thrown gives a basically completely irrelevant second value (with regard to identifying the error. Actually, the first value is also irrelevant unless it is negative). Could one replace that second value with the function causing the error instead, or would that be inconsistent with any known error handlers? Or should one throw a different error altogether in the case where match-data is called without a valid match ever having been done? Which one? As I said already, I have no clue about the error handling conventions. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-14 9:05 ` David Kastrup @ 2004-06-14 10:05 ` Stephen J. Turnbull 0 siblings, 0 replies; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-14 10:05 UTC (permalink / raw) Cc: Stephen J. Turnbull, rms, emacs-devel >>>>> "David" == David Kastrup <dak@gnu.org> writes: David> Anyway, the args-out-of-range error that now gets thrown David> gives a basically completely irrelevant second value True. David> Or should one throw a different error altogether in the David> case where match-data is called without a valid match ever David> having been done? That's the solution I chose, except I haven't made it an error yet. When I do, would use invalid-state. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-14 5:06 ` Stephen J. Turnbull 2004-06-14 9:05 ` David Kastrup @ 2004-06-16 7:13 ` Stephen J. Turnbull 2004-06-19 3:19 ` Richard Stallman 2004-06-19 3:19 ` Richard Stallman 1 sibling, 2 replies; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-16 7:13 UTC (permalink / raw) >>>>> "sjt" == Stephen J Turnbull <stephen@xemacs.org> writes: sjt> Anyway, I've implemented a last-match-succeeded flag (for sjt> debug usage only) and check it in match_limits and sjt> Fmatch_data (when XEmacs is configured for error-checking). I've changed this so that Fmatch_data saves the state of that flag with the registers, and Fstore_match_data restores it. This is bogus if somebody uses an explicit call to match-data, but so far I've only seen explicit use of match-data in macros like save-match-data. Unfortunately external packages occasionally define similar macros, making it difficult to suppress hundreds of bogus warnings from functions that use those macros. sjt> If we catch anything that looks relevant to GNU Emacs I'll sjt> report it here. OK, I have tripped this in three functions so far. All of them look relevant to GNU Emacs. isearch.el (isearch-repeat): both calls to match-{beginning,end} font-lock.el (font-lock-fontify-keywords-region): all calls to match-{beginning,end} (font-lock-fontify-anchored-keywords): all calls to match-{beginning,end} The calls in isearch.el are part of logic I don't understand yet. It looks like a zero-length match is being used to determine whether the isearch-success flag is lying. It should be possible to fix this, but I don't know when I'll have time to do so. The calls in font-lock are used to fontify text. They are guarded by a proper test for success in the case of a MATCHER which is a regexp as far as I can tell. So I suspect that in fact a MATCHER which is a function is being passed in, and it succeeds even though the regexp match failed. This is probably cc-mode, and possibly emacs-lisp-mode too. In all cases guarding the calls to match-{beginning,end} with a test on last-match-succeeded seems to leave behavior unchanged, except that font-lock seems perceptibly faster for both C and Lisp. (I haven't tried to measure that yet, though.) Again, it's unlikely I'll have time to dig into this soon, so I'm reporting it now. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-16 7:13 ` Stephen J. Turnbull @ 2004-06-19 3:19 ` Richard Stallman 2004-06-23 9:53 ` Stephen J. Turnbull 2004-06-19 3:19 ` Richard Stallman 1 sibling, 1 reply; 16+ messages in thread From: Richard Stallman @ 2004-06-19 3:19 UTC (permalink / raw) Cc: emacs-devel OK, I have tripped this in three functions so far. All of them look relevant to GNU Emacs. isearch.el (isearch-repeat): both calls to match-{beginning,end} It is checking whether the last match for the search string was empty. I think this change should make it work without using the match-data. Does it work? *** isearch.el 06 Jun 2004 09:56:16 -0400 1.228 --- isearch.el 18 Jun 2004 22:10:53 -0400 *************** *** 999,1005 **** (if (equal isearch-string "") (setq isearch-success t) ! (if (and isearch-success (equal (match-end 0) (match-beginning 0)) (not isearch-just-started)) ;; If repeating a search that found ;; an empty string, ensure we advance. --- 999,1006 ---- (if (equal isearch-string "") (setq isearch-success t) ! (if (and isearch-success ! (equal (point) isearch-other-end) (not isearch-just-started)) ;; If repeating a search that found ;; an empty string, ensure we advance. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-19 3:19 ` Richard Stallman @ 2004-06-23 9:53 ` Stephen J. Turnbull 0 siblings, 0 replies; 16+ messages in thread From: Stephen J. Turnbull @ 2004-06-23 9:53 UTC (permalink / raw) Cc: emacs-devel >>>>> "rms" == Richard Stallman <rms@gnu.org> writes: OK, I have tripped this in three functions so far. All of them look relevant to GNU Emacs. isearch.el (isearch-repeat): both calls to match-{beginning,end} rms> It is checking whether the last match for the search string rms> was empty. I think this change should make it work without rms> using the match-data. Does it work? Your change eliminates the match-data access, of course, but I just realized as I tried to test it that I really don't have a feel for what "work" means. I don't use regexp-isearch very often, and regexps that match the null string almost never. As far as I can tell in a few days testing string searching and the simple regexps that I commonly use are producing no surprises, and "artificial" regexps that should match the empty string do, while other matchers do not. Repeat regexp searches for ".*" behave the same with both implementations, including wrapping around bob and eob. So I would say it works. Well enough to install in our beta tree, anyway. *** isearch.el 06 Jun 2004 09:56:16 -0400 1.228 --- isearch.el 18 Jun 2004 22:10:53 -0400 *************** *** 999,1005 **** (if (equal isearch-string "") (setq isearch-success t) ! (if (and isearch-success (equal (match-end 0) (match-beginning 0)) (not isearch-just-started)) ;; If repeating a search that found ;; an empty string, ensure we advance. --- 999,1006 ---- (if (equal isearch-string "") (setq isearch-success t) ! (if (and isearch-success ! (equal (point) isearch-other-end) (not isearch-just-started)) ;; If repeating a search that found ;; an empty string, ensure we advance. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-16 7:13 ` Stephen J. Turnbull 2004-06-19 3:19 ` Richard Stallman @ 2004-06-19 3:19 ` Richard Stallman 1 sibling, 0 replies; 16+ messages in thread From: Richard Stallman @ 2004-06-19 3:19 UTC (permalink / raw) Cc: emacs-devel font-lock.el (font-lock-fontify-keywords-region): all calls to match-{beginning,end} (font-lock-fontify-anchored-keywords): all calls to match-{beginning,end} Is it caused by this code? (defun c-find-invalid-doc-markup (regexp limit) ;; Used to fontify invalid markup in doc comments after the correct ;; ones have been fontified: Find the first occurence of REGEXP ;; between the point and LIMIT that only is fontified with ;; `c-doc-face-name'. If a match is found then submatch 0 surrounds ;; the first char and t is returned, otherwise nil is returned. (let (start) (while (if (re-search-forward regexp limit t) (not (eq (get-text-property (setq start (match-beginning 0)) 'face) c-doc-face-name)) (setq start nil))) (when start (store-match-data (list (copy-marker start) (copy-marker (1+ start)))) t))) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Discrepancy in definition/use of match-data? 2004-06-10 23:56 ` David Kastrup 2004-06-11 8:34 ` Stephen J. Turnbull @ 2004-06-12 1:51 ` Richard Stallman 1 sibling, 0 replies; 16+ messages in thread From: Richard Stallman @ 2004-06-12 1:51 UTC (permalink / raw) Cc: emacs-devel Now the obvious solution to that would be to make unsuccessful matches void the match-data, too. That might be a good thing to do, except that doing it now is likely to delay the release. So my proposal is the following plan: Before next release: match-data gets voided upon entry of a filter or sentinel, like it is being done now. With void match-data, match-beginning and so on flag an error irrespective of their argument. The match-data is only touched by a successful match. Once a match has been successful, match-beginning and so on will not flag errors for positive arguments, but return nil (as documented). In order to have a better chance of catching such use-before-valid-match situations, it might be a good idea also to void the match-data in the main loop. Voiding in the main loop was exactly the idea that occurred to me. So let's do all of this now. After the next release, I would like to have unsuccessful matches also void the match-data, making the use of the above quoted idiom illegal. I agree. (But please, let's say "cause an error", not "illegal". Nobody is going to be jailed for doing this.) We should do this change right after the next release so that Emacs core developers and those for add-on packages have enough time testing it, finding the problematic code and fixing it for the next release after that. We could do it right now in the Unicode branch. That way, we'd assure it will get into the following major release, but detection of code that needs changing could start right away. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2004-06-23 9:53 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup 2004-06-10 23:01 ` Richard Stallman 2004-06-10 23:56 ` David Kastrup 2004-06-11 8:34 ` Stephen J. Turnbull 2004-06-11 8:54 ` David Kastrup 2004-06-12 6:45 ` Stephen J. Turnbull 2004-06-12 9:03 ` David Kastrup 2004-06-13 0:01 ` Richard Stallman 2004-06-14 5:06 ` Stephen J. Turnbull 2004-06-14 9:05 ` David Kastrup 2004-06-14 10:05 ` Stephen J. Turnbull 2004-06-16 7:13 ` Stephen J. Turnbull 2004-06-19 3:19 ` Richard Stallman 2004-06-23 9:53 ` Stephen J. Turnbull 2004-06-19 3:19 ` Richard Stallman 2004-06-12 1:51 ` Richard Stallman
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.