Discrepancy in definition/use of match-data?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Discrepancy in definition/use of match-data?
@ 2004-06-09 15:37 David Kastrup
  2004-06-10 23:01 ` Richard Stallman
  0 siblings, 1 reply; 16+ messages in thread
From: David Kastrup @ 2004-06-09 15:37 UTC (permalink / raw)



We have the following excerpt here:

static Lisp_Object
match_limit (num, beginningp)
     Lisp_Object num;
     int beginningp;
{
  register int n;

  CHECK_NUMBER (num);
  n = XINT (num);
  if (n < 0 || n >= search_regs.num_regs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    args_out_of_range (num, make_number (search_regs.num_regs));
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  if (search_regs.num_regs <= 0
      || search_regs.start[n] < 0)
    return Qnil;
  return (make_number ((beginningp) ? search_regs.start[n]
		                    : search_regs.end[n]));
}

DEFUN ("match-beginning", Fmatch_beginning, Smatch_beginning, 1, 1, 0,
       doc: /* Return position of start of text matched by last search.
SUBEXP, a number, specifies which parenthesized expression in the last
  regexp.
Value is nil if SUBEXPth pair didn't match, or there were less than
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  SUBEXP pairs.
  ^^^^^^^^^^^^^

Same for match-end and probably match-string.  But if there was no
previous match with at least the same number of SUBEXPs in the
history of the Emacs session, we will get an args_out_of_range error.

I think the fix would be to have match-limit return Qnil instead of
flagging an error for the condition

  n >= search_regs.num_regs

Correct?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup
@ 2004-06-10 23:01 ` Richard Stallman
  2004-06-10 23:56   ` David Kastrup
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Stallman @ 2004-06-10 23:01 UTC (permalink / raw)
  Cc: emacs-devel

    I think the fix would be to have match-limit return Qnil instead of
    flagging an error for the condition

      n >= search_regs.num_regs

    Correct?

I agree with you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-10 23:01 ` Richard Stallman
@ 2004-06-10 23:56   ` David Kastrup
  2004-06-11  8:34     ` Stephen J. Turnbull
  2004-06-12  1:51     ` Richard Stallman
  0 siblings, 2 replies; 16+ messages in thread
From: David Kastrup @ 2004-06-10 23:56 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     I think the fix would be to have match-limit return Qnil instead of
>     flagging an error for the condition
> 
>       n >= search_regs.num_regs
> 
>     Correct?
> 
> I agree with you.

I have thought a bit about the context in which I have seen this error
condition flagged in the last few days.  The first example was inside
of split-string (the 21.4.13 XEmacs version) when running inside of a
process filter: the routine was buggy since it could call
(match-beginning 0) even when there were no preceeding _successful_
matches.  However, since the current API will not change the match
data after an unsuccessful match, in most uses this bug was masked by
some old existing data in (match-beginning 0).  So only the filter
routine (which starts out with a fresh copy of void match-data)
provided a sufficient context for flagging the bug.  In particular,
such a bug is very evasive in debugging, since the situation of
pristinely void match-data will then typically be absent.

In consequence, this bug persisted for half a year _after_ it was
observed.  If I change the test as indicated above, the bug would not
even get flagged anymore, not even in a process filter, and would go
completely undetected.  Not good.

Now the obvious solution to that would be to make unsuccessful
matches void the match-data, too.  I myself have no recollection of
any discussions about this, but Stephen Turnbull was pretty vocal
about having proposed something like this, but that apparently the
idiom

(while (search-forward "Something"))
do something with (match-beginning 0)

was given as a reason why an unsuccessful match should not clear the
match-data.  But actually, the documentation of the match-data
function claims:

    Return value is undefined if the last search failed.

Invalidating the match-data after unsuccessful matches would quite
increase the probability of catching such bugs, while being consistent
with the current documentation of match-data.  However, if this
suggestion has been turned down with the above example, then this
"idiom" needs to get searched and replaced.  It is my personal opinion
that this idiom is not worth the loss of bug detection, and it is
inconsistent with the current match-data documentation, anyway.

But it would probably be a bad idea to change this right now in the
feature freeze, possibly breaking things relying on that quite
undocumented behavior.

So my proposal is the following plan:

Before next release:
match-data gets voided upon entry of a filter or sentinel, like it is
being done now.  With void match-data, match-beginning and so on flag
an error irrespective of their argument.  The match-data is only
touched by a successful match.  Once a match has been successful,
match-beginning and so on will not flag errors for positive
arguments, but return nil (as documented).

In order to have a better chance of catching such
use-before-valid-match situations, it might be a good idea also to
void the match-data in the main loop.

After the next release, I would like to have unsuccessful matches
also void the match-data, making the use of the above quoted idiom
illegal.  It is consistent with the current documentation of the
functions, and it gives much better possibilities of actually
catching bugs that at the moment only trigger errors in obscure,
badly debuggable places like output filters.

We should do this change right after the next release so that Emacs
core developers and those for add-on packages have enough time testing
it, finding the problematic code and fixing it for the next release
after that.

What I'll do right now is just changing the condition that it will not
flag an error for greater-than-encountered match-data indices, except
for the case of completely void match-data where I'll still flag an
error.  So my current change will flag fewer errors than before (and
in particular leave out false alarms like in locate.el).

This change does not merit discussion before application: it only
fixes things without breaking existing idioms.  My other proposals
(such as voiding the match-data in the main loop) would warrant other
opinions, however.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-10 23:56   ` David Kastrup
@ 2004-06-11  8:34     ` Stephen J. Turnbull
  2004-06-11  8:54       ` David Kastrup
  2004-06-12  1:51     ` Richard Stallman
  1 sibling, 1 reply; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-11  8:34 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "David" == David Kastrup <dak@gnu.org> writes:

    David> Now the obvious solution to that would be to make
    David> unsuccessful matches void the match-data, too.  I myself
    David> have no recollection of any discussions about this, but
    David> Stephen Turnbull was pretty vocal about having proposed
    David> something like this,

To be precise, I tried it in a workspace and was immediately slapped
down by a regression test failure.

    David> What I'll do right now is just changing the condition that
    David> it will not flag an error for greater-than-encountered
    David> match-data indices, except for the case of completely void
    David> match-data where I'll still flag an error.

I would suggest improving the error message if at all possible, and
documenting this prominently, as it is likely to happen very rarely,
and then only in asynchronous calls.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-11  8:34     ` Stephen J. Turnbull
@ 2004-06-11  8:54       ` David Kastrup
  2004-06-12  6:45         ` Stephen J. Turnbull
  0 siblings, 1 reply; 16+ messages in thread
From: David Kastrup @ 2004-06-11  8:54 UTC (permalink / raw)
  Cc: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> >>>>> "David" == David Kastrup <dak@gnu.org> writes:
> 
>     David> Now the obvious solution to that would be to make
>     David> unsuccessful matches void the match-data, too.  I myself
>     David> have no recollection of any discussions about this, but
>     David> Stephen Turnbull was pretty vocal about having proposed
>     David> something like this,
> 
> To be precise, I tried it in a workspace and was immediately slapped
> down by a regression test failure.

That in itself does not constitute a decision.  It suggests that such
a change should probably not be made lightly, and not shortly before a
release.  But it does not preclude a long-term strategy of change if
one can agree on the desirability: one would start by actively
declaring this usage deprecated (if it ever was supposed to be allowed
in the first place).  At what time one actually clamps down the code
is unrelated to the question of whether it is desirable to do so.

How strong were the results from the regression test?  What kind and
amount of code appears to be affected?

>     David> What I'll do right now is just changing the condition
>     David> that it will not flag an error for
>     David> greater-than-encountered match-data indices, except for
>     David> the case of completely void match-data where I'll still
>     David> flag an error.
> 
> I would suggest improving the error message if at all possible, and
> documenting this prominently, as it is likely to happen very rarely,
> and then only in asynchronous calls.

With the current code.  But I was also proposing to void the
match-data in the main loop: that would produce errors more often, and
only in situations where the match-data was completely unpredictable
to start with (and possibly undefined, too).  I have no clue about
the involved complexity: if this leads to a danger of, say,
recursive-edit or debug not working like before, one should postpone
till after release.

Anyway, I do not know enough about the error handling in C to propose
better error handling or messages.  I do agree that the currently
generated message is dissatisfactorily obtuse.  In particular, since
the context flagged in the traceback is the caller, and not the
particular function itself, so it is guesswork to figure out just
_what_ function triggered the "out of range" error.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-11  8:54       ` David Kastrup
@ 2004-06-12  6:45         ` Stephen J. Turnbull
  2004-06-12  9:03           ` David Kastrup
  2004-06-13  0:01           ` Richard Stallman
  0 siblings, 2 replies; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-12  6:45 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "David" == David Kastrup <dak@gnu.org> writes:

    David> How strong were the results from the regression test?

What does that mean?  There is one test that specifically checks for
match-data being preserved across a failed match, and it failed.

    David> What kind and amount of code appears to be affected?

Kind?  I know of one specific use in `w3-configuration-data' in
w3-cfg.el which calls itself recursively, and depends on being able to
use the top-level match-data if the match in the recursive call fails,
while using the match-data from the recursive call otherwise.  Too
sneaky to live, I suppose you could say.

Amount?  How should I know?  I can say I've tried inserting warning
code in match_limit and file-name-sans-extension produces the warning
during the dump phase.  I don't know why yet, my implementation may be
incorrect (the flag is either getting reset when it shouldn't, or
fails to get set on a successful search---the Boyer-Moore code is the
most complicated thing I've ever tried to deal with).

However, I know that file-handlers can get called there, via
file-name-directory inter alia.  In general, even a simple variable
reference can call arbitrary Lisp code in XEmacs (because of magic
Mule handlers) and most likely GNU Emacs (because those handlers were
introduced for GNU Emacs compatibility).

    David> With the current code.  But I was also proposing to void
    David> the match-data in the main loop: that would produce errors
    David> more often, and only in situations where the match-data was
    David> completely unpredictable to start with (and possibly
    David> undefined, too).

But that's not true.  The match-data was sufficiently predictable that
split-string only failed when it was voided.  I think it's likely that
that mistake has been made elsewhere.

On the other hand, pretty much any time any Lisp code intervenes
between the return of the matching function and entry to the match
data access there is an opportunity for a hook or handler to be
called.  Probability of hitting it on any given path is low, but
potential for random annoyance for years to come is high, I fear.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-12  6:45         ` Stephen J. Turnbull
@ 2004-06-12  9:03           ` David Kastrup
  2004-06-13  0:01           ` Richard Stallman
  1 sibling, 0 replies; 16+ messages in thread
From: David Kastrup @ 2004-06-12  9:03 UTC (permalink / raw)
  Cc: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> >>>>> "David" == David Kastrup <dak@gnu.org> writes:
> 
>     David> How strong were the results from the regression test?
> 
> What does that mean?  There is one test that specifically checks for
> match-data being preserved across a failed match, and it failed.

Ah, ok.  I thought that the regression test might have involved
calling a lot of complex functions to see whether they still worked.
It dod not occur to me that it checked for the particular semantics
explicitly.

>     David> What kind and amount of code appears to be affected?
> 
> Kind?  I know of one specific use in `w3-configuration-data' in
> w3-cfg.el which calls itself recursively, and depends on being able
> to use the top-level match-data if the match in the recursive call
> fails, while using the match-data from the recursive call otherwise.
> Too sneaky to live, I suppose you could say.

Sneakiness is all fine.  It is just the question whether it is worth
the price.  It would be a good idea to check this price out after the
next release, as said.

> Amount?  How should I know?  I can say I've tried inserting warning
> code in match_limit and file-name-sans-extension produces the
> warning during the dump phase.  I don't know why yet, my
> implementation may be incorrect (the flag is either getting reset
> when it shouldn't, or fails to get set on a successful search---the
> Boyer-Moore code is the most complicated thing I've ever tried to
> deal with).
> 
> However, I know that file-handlers can get called there, via
> file-name-directory inter alia.  In general, even a simple variable
> reference can call arbitrary Lisp code in XEmacs (because of magic
> Mule handlers) and most likely GNU Emacs (because those handlers
> were introduced for GNU Emacs compatibility).

In which case this particular sort of references better use
save-match-data.

>     David> With the current code.  But I was also proposing to void
>     David> the match-data in the main loop: that would produce
>     David> errors more often, and only in situations where the
>     David> match-data was completely unpredictable to start with
>     David> (and possibly undefined, too).
> 
> But that's not true.  The match-data was sufficiently predictable
> that split-string only failed when it was voided.

The code in question was the following:

  (let (parts (start 0) (len (length string)))
    (if (string-match pattern string)
	(setq parts (cons (substring string 0 (match-beginning 0)) parts)
	      start (match-end 0)))

The error occured when the condition was false to start with and setq
not reached.  Then we had the following:

    (while (and (< start len)

This condition is true for non-empty string

		(string-match pattern string (if (> start (match-beginning 0))
						 start
					       (1+ start))))

start is still 0, match-beginning has a random value (this is where
the error got flagged).  Without an error, we'll get into the false
branch, so the match starts at 1 now.  Since the precondition was only
true for non-empty string, starting the match at 1 is valid, will
again fail like the first match, and the loop body gets skipped.

So this time we are lucky: the random contents of match-beginning
would indeed not matter.  Would you want to rely on that?

> I think it's likely that that mistake has been made elsewhere.

I doubt that we can rely on its consequences being benign.

> On the other hand, pretty much any time any Lisp code intervenes
> between the return of the matching function and entry to the match
> data access there is an opportunity for a hook or handler to be
> called.

That's what save-match-data is for.  Hooks or handlers that intervene
at random places need to use it when they might likely call regexp
code.  This is nothing new.

> Probability of hitting it on any given path is low, but potential
> for random annoyance for years to come is high, I fear.

It is obvious that you are not fond of the interface, presumably
preferring a complex return value for matching functions including the
match data.  But it is not like it would be feasible to change it,
anyway.

The only addition I can think of is to offer an explicit function
void-match-data that users would be free to call at points when they
know the match-data no longer to be needed.  Having such a function
explicitly available might help for debugging use-before-definition
cases like the above.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-12  6:45         ` Stephen J. Turnbull
  2004-06-12  9:03           ` David Kastrup
@ 2004-06-13  0:01           ` Richard Stallman
  2004-06-14  5:06             ` Stephen J. Turnbull
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Stallman @ 2004-06-13  0:01 UTC (permalink / raw)
  Cc: dak, emacs-devel

      In general, even a simple variable
    reference can call arbitrary Lisp code in XEmacs (because of magic
    Mule handlers) and most likely GNU Emacs (because those handlers were
    introduced for GNU Emacs compatibility).

I have never heard of "magic Mule handlers", and I am pretty sure there
is nothing like this in Emacs.  If we had them, we would make them save
and restore the match data.

    On the other hand, pretty much any time any Lisp code intervenes
    between the return of the matching function and entry to the match
    data access there is an opportunity for a hook or handler to be
    called.

That is rather an exaggeration.  Most of the Lisp functions described
in the manual cannot call any hook.  Asynchronous activities only happen
inside functions that can wait, which means, only certain primitives.
Likewise for file name handlers.  And these normally save the match
data anyway.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-13  0:01           ` Richard Stallman
@ 2004-06-14  5:06             ` Stephen J. Turnbull
  2004-06-14  9:05               ` David Kastrup
  2004-06-16  7:13               ` Stephen J. Turnbull
  0 siblings, 2 replies; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-14  5:06 UTC (permalink / raw)
  Cc: dak, emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    rms> sjt writes:

    On the other hand, pretty much any time any Lisp code intervenes
    between the return of the matching function and entry to the match
    data access there is an opportunity for a hook or handler to be
    called.

    rms> That is rather an exaggeration.

Exaggeration, sure.  But it contains a kernel of truth.

    rms> Most of the Lisp functions described in the manual cannot
    rms> call any hook.

True, but not relevant unless you know which ones they are.  Offhand,
I don't, and the docstrings/source comments are less than 100%
reliable.

Anyway, I've implemented a last-match-succeeded flag (for debug usage
only) and check it in match_limits and Fmatch_data (when XEmacs is
configured for error-checking).  If we catch anything that looks
relevant to GNU Emacs I'll report it here.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-14  5:06             ` Stephen J. Turnbull
@ 2004-06-14  9:05               ` David Kastrup
  2004-06-14 10:05                 ` Stephen J. Turnbull
  2004-06-16  7:13               ` Stephen J. Turnbull
  1 sibling, 1 reply; 16+ messages in thread
From: David Kastrup @ 2004-06-14  9:05 UTC (permalink / raw)
  Cc: rms, emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> >>>>> "rms" == Richard Stallman <rms@gnu.org> writes:
> 
>     rms> sjt writes:
> 
>     On the other hand, pretty much any time any Lisp code intervenes
>     between the return of the matching function and entry to the match
>     data access there is an opportunity for a hook or handler to be
>     called.
> 
>     rms> That is rather an exaggeration.
> 
> Exaggeration, sure.  But it contains a kernel of truth.

Those things that run in hooks/handlers/whatever need to get wrapped
in save-match-data, anyway.

>     rms> Most of the Lisp functions described in the manual cannot
>     rms> call any hook.
> 
> True, but not relevant unless you know which ones they are.

> Offhand, I don't, and the docstrings/source comments are less than
> 100% reliable.

Where the docstrings are not reliable, they need to get amended,
obviously.

> Anyway, I've implemented a last-match-succeeded flag (for debug
> usage only) and check it in match_limits and Fmatch_data (when
> XEmacs is configured for error-checking).  If we catch anything that
> looks relevant to GNU Emacs I'll report it here.

Anyway, the args-out-of-range error that now gets thrown gives a
basically completely irrelevant second value (with regard to
identifying the error.  Actually, the first value is also irrelevant
unless it is negative).  Could one replace that second value with the
function causing the error instead, or would that be inconsistent with
any known error handlers?

Or should one throw a different error altogether in the case where
match-data is called without a valid match ever having been done?
Which one?  As I said already, I have no clue about the error handling
conventions.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-14  9:05               ` David Kastrup
@ 2004-06-14 10:05                 ` Stephen J. Turnbull
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-14 10:05 UTC (permalink / raw)
  Cc: Stephen J. Turnbull, rms, emacs-devel

>>>>> "David" == David Kastrup <dak@gnu.org> writes:

    David> Anyway, the args-out-of-range error that now gets thrown
    David> gives a basically completely irrelevant second value

True.

    David> Or should one throw a different error altogether in the
    David> case where match-data is called without a valid match ever
    David> having been done?

That's the solution I chose, except I haven't made it an error yet.
When I do, would use invalid-state.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-14  5:06             ` Stephen J. Turnbull
  2004-06-14  9:05               ` David Kastrup
@ 2004-06-16  7:13               ` Stephen J. Turnbull
  2004-06-19  3:19                 ` Richard Stallman
  2004-06-19  3:19                 ` Richard Stallman
  1 sibling, 2 replies; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-16  7:13 UTC (permalink / raw)


>>>>> "sjt" == Stephen J Turnbull <stephen@xemacs.org> writes:

    sjt> Anyway, I've implemented a last-match-succeeded flag (for
    sjt> debug usage only) and check it in match_limits and
    sjt> Fmatch_data (when XEmacs is configured for error-checking).

I've changed this so that Fmatch_data saves the state of that flag
with the registers, and Fstore_match_data restores it.  This is bogus
if somebody uses an explicit call to match-data, but so far I've only
seen explicit use of match-data in macros like save-match-data.
Unfortunately external packages occasionally define similar macros,
making it difficult to suppress hundreds of bogus warnings from
functions that use those macros.

    sjt> If we catch anything that looks relevant to GNU Emacs I'll
    sjt> report it here.

OK, I have tripped this in three functions so far.  All of them look
relevant to GNU Emacs.

isearch.el (isearch-repeat):  both calls to match-{beginning,end}
font-lock.el (font-lock-fontify-keywords-region):
                              all calls to match-{beginning,end}
             (font-lock-fontify-anchored-keywords):
                              all calls to match-{beginning,end}

The calls in isearch.el are part of logic I don't understand yet.  It
looks like a zero-length match is being used to determine whether the
isearch-success flag is lying.  It should be possible to fix this, but
I don't know when I'll have time to do so.

The calls in font-lock are used to fontify text.  They are guarded by a
proper test for success in the case of a MATCHER which is a regexp as
far as I can tell.  So I suspect that in fact a MATCHER which is a
function is being passed in, and it succeeds even though the regexp
match failed.  This is probably cc-mode, and possibly emacs-lisp-mode
too.

In all cases guarding the calls to match-{beginning,end} with a test
on last-match-succeeded seems to leave behavior unchanged, except that
font-lock seems perceptibly faster for both C and Lisp.  (I haven't
tried to measure that yet, though.)  Again, it's unlikely I'll have
time to dig into this soon, so I'm reporting it now.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-16  7:13               ` Stephen J. Turnbull
@ 2004-06-19  3:19                 ` Richard Stallman
  2004-06-23  9:53                   ` Stephen J. Turnbull
  2004-06-19  3:19                 ` Richard Stallman
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Stallman @ 2004-06-19  3:19 UTC (permalink / raw)
  Cc: emacs-devel

    OK, I have tripped this in three functions so far.  All of them look
    relevant to GNU Emacs.

    isearch.el (isearch-repeat):  both calls to match-{beginning,end}

It is checking whether the last match for the search string was empty.
I think this change should make it work without using the match-data.
Does it work?

*** isearch.el	06 Jun 2004 09:56:16 -0400	1.228
--- isearch.el	18 Jun 2004 22:10:53 -0400	
***************
*** 999,1005 ****
  
    (if (equal isearch-string "")
        (setq isearch-success t)
!     (if (and isearch-success (equal (match-end 0) (match-beginning 0))
  	     (not isearch-just-started))
  	;; If repeating a search that found
  	;; an empty string, ensure we advance.
--- 999,1006 ----
  
    (if (equal isearch-string "")
        (setq isearch-success t)
!     (if (and isearch-success
! 	     (equal (point) isearch-other-end)
  	     (not isearch-just-started))
  	;; If repeating a search that found
  	;; an empty string, ensure we advance.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-19  3:19                 ` Richard Stallman
@ 2004-06-23  9:53                   ` Stephen J. Turnbull
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen J. Turnbull @ 2004-06-23  9:53 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

        OK, I have tripped this in three functions so far.  All
    of them look relevant to GNU Emacs.

        isearch.el (isearch-repeat): both calls to match-{beginning,end}

    rms> It is checking whether the last match for the search string
    rms> was empty.  I think this change should make it work without
    rms> using the match-data.  Does it work?

Your change eliminates the match-data access, of course, but I just
realized as I tried to test it that I really don't have a feel for
what "work" means.  I don't use regexp-isearch very often, and regexps
that match the null string almost never.  As far as I can tell in a
few days testing string searching and the simple regexps that I
commonly use are producing no surprises, and "artificial" regexps that
should match the empty string do, while other matchers do not.  Repeat
regexp searches for ".*" behave the same with both implementations,
including wrapping around bob and eob.

So I would say it works.  Well enough to install in our beta tree,
anyway.

*** isearch.el	06 Jun 2004 09:56:16 -0400	1.228
--- isearch.el	18 Jun 2004 22:10:53 -0400	
***************
*** 999,1005 ****
  
    (if (equal isearch-string "")
        (setq isearch-success t)
!     (if (and isearch-success (equal (match-end 0) (match-beginning 0))
  	     (not isearch-just-started))
  	;; If repeating a search that found
  	;; an empty string, ensure we advance.
--- 999,1006 ----
  
    (if (equal isearch-string "")
        (setq isearch-success t)
!     (if (and isearch-success
! 	     (equal (point) isearch-other-end)
  	     (not isearch-just-started))
  	;; If repeating a search that found
  	;; an empty string, ensure we advance.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-16  7:13               ` Stephen J. Turnbull
  2004-06-19  3:19                 ` Richard Stallman
@ 2004-06-19  3:19                 ` Richard Stallman
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Stallman @ 2004-06-19  3:19 UTC (permalink / raw)
  Cc: emacs-devel

    font-lock.el (font-lock-fontify-keywords-region):
				  all calls to match-{beginning,end}
		 (font-lock-fontify-anchored-keywords):
				  all calls to match-{beginning,end}

Is it caused by this code?

(defun c-find-invalid-doc-markup (regexp limit)
  ;; Used to fontify invalid markup in doc comments after the correct
  ;; ones have been fontified: Find the first occurence of REGEXP
  ;; between the point and LIMIT that only is fontified with
  ;; `c-doc-face-name'.  If a match is found then submatch 0 surrounds
  ;; the first char and t is returned, otherwise nil is returned.
  (let (start)
    (while (if (re-search-forward regexp limit t)
	       (not (eq (get-text-property
			 (setq start (match-beginning 0)) 'face)
			c-doc-face-name))
	     (setq start nil)))
    (when start
      (store-match-data (list (copy-marker start)
			      (copy-marker (1+ start))))
      t)))

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Discrepancy in definition/use of match-data?
  2004-06-10 23:56   ` David Kastrup
  2004-06-11  8:34     ` Stephen J. Turnbull
@ 2004-06-12  1:51     ` Richard Stallman
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Stallman @ 2004-06-12  1:51 UTC (permalink / raw)
  Cc: emacs-devel

    Now the obvious solution to that would be to make unsuccessful
    matches void the match-data, too.

That might be a good thing to do, except that doing it now is likely
to delay the release.

    So my proposal is the following plan:

    Before next release:
    match-data gets voided upon entry of a filter or sentinel, like it is
    being done now.  With void match-data, match-beginning and so on flag
    an error irrespective of their argument.  The match-data is only
    touched by a successful match.  Once a match has been successful,
    match-beginning and so on will not flag errors for positive
    arguments, but return nil (as documented).

    In order to have a better chance of catching such
    use-before-valid-match situations, it might be a good idea also to
    void the match-data in the main loop.

Voiding in the main loop was exactly the idea that occurred to me.
So let's do all of this now.

    After the next release, I would like to have unsuccessful matches
    also void the match-data, making the use of the above quoted idiom
    illegal.

I agree.  (But please, let's say "cause an error", not "illegal".
Nobody is going to be jailed for doing this.)

    We should do this change right after the next release so that Emacs
    core developers and those for add-on packages have enough time testing
    it, finding the problematic code and fixing it for the next release
    after that.

We could do it right now in the Unicode branch.  That way, we'd assure
it will get into the following major release, but detection of code
that needs changing could start right away.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-06-23  9:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-09 15:37 Discrepancy in definition/use of match-data? David Kastrup
2004-06-10 23:01 ` Richard Stallman
2004-06-10 23:56   ` David Kastrup
2004-06-11  8:34     ` Stephen J. Turnbull
2004-06-11  8:54       ` David Kastrup
2004-06-12  6:45         ` Stephen J. Turnbull
2004-06-12  9:03           ` David Kastrup
2004-06-13  0:01           ` Richard Stallman
2004-06-14  5:06             ` Stephen J. Turnbull
2004-06-14  9:05               ` David Kastrup
2004-06-14 10:05                 ` Stephen J. Turnbull
2004-06-16  7:13               ` Stephen J. Turnbull
2004-06-19  3:19                 ` Richard Stallman
2004-06-23  9:53                   ` Stephen J. Turnbull
2004-06-19  3:19                 ` Richard Stallman
2004-06-12  1:51     ` Richard Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.