unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
@ 2011-10-06  9:13 Štěpán Němec
  2011-10-06 12:57 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Štěpán Němec @ 2011-10-06  9:13 UTC (permalink / raw)
  To: 9681

Quoting from <http://permalink.gmane.org/gmane.emacs.gnus.user/15052>:

===========

> What am I doing wrong?

Nothing, I think :-). I personally don't use fancy split-
ting, but a deeper look at (at least Gnus 5.13's) code seems
to locate the culprit in Emacs' *backward* regular expres-
sion "non-greedity": Position point at the end of
"bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
is matched.

============

If this curious inconsistency of `re-search-backward' with
`re-search-forward' is intentional (which I hope it is not), it should
be documented, but I couldn't find anything in the manuals or
docstrings.

-- 
Štěpán





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-06  9:13 bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character) Štěpán Němec
@ 2011-10-06 12:57 ` Stefan Monnier
  2011-10-06 18:48   ` Štěpán Němec
  2011-10-06 19:46 ` Johan Bockgård
  2012-03-16 15:49 ` Jack Duthen
  2 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2011-10-06 12:57 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: 9681

> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.

re-search-* stops at the first character position that has a match.
And then it chooses the longest match at that position.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-06 12:57 ` Stefan Monnier
@ 2011-10-06 18:48   ` Štěpán Němec
  2011-10-07 13:02     ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Štěpán Němec @ 2011-10-06 18:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 9681

[Stefan: sorry for two replies, I forgot to cc the bug list in my first
reply, also, I've changed my mind on some of the points since then, see
below.]

On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> > If this curious inconsistency of `re-search-backward' with
> > `re-search-forward' is intentional (which I hope it is not), it should
> > be documented, but I couldn't find anything in the manuals or
> > docstrings.
>
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.

Thanks, but I'm not sure I understand what you mean here. Naturally, the
longest match for `re-search-backward' should be backward, not forward,
i.e. using your wording above, when searching _backward_ for \w+ in
"foobar|" where "|" is point, the "first character position that has a
match" might be "r", but it's hardly the longest match.

If I'm the only one who considers this behaviour broken (by design?[1]),
which I very much doubt, it definitely needs to at least be documented,
as I'm certainly not the only one who is very surprised by this
behaviour. In my opinion it should be fixed, though.

[1] Cf. e.g. ?\w\+ in Vim, which does the right thing.

--
Štěpán





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-06  9:13 bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character) Štěpán Němec
  2011-10-06 12:57 ` Stefan Monnier
@ 2011-10-06 19:46 ` Johan Bockgård
  2012-03-16 15:49 ` Jack Duthen
  2 siblings, 0 replies; 10+ messages in thread
From: Johan Bockgård @ 2011-10-06 19:46 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: 9681

Štěpán Němec <stepnem@gmail.com> writes:

> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.

Then you must not have looked very hard.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-06 18:48   ` Štěpán Němec
@ 2011-10-07 13:02     ` Stefan Monnier
  2011-10-07 13:19       ` Štěpán Němec
  2011-10-11  2:03       ` Kenichi Handa
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Monnier @ 2011-10-07 13:02 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: 9681

>> re-search-* stops at the first character position that has a match.
>> And then it chooses the longest match at that position.
> Thanks, but I'm not sure I understand what you mean here. Naturally, the
> longest match for `re-search-backward' should be backward, not forward,

Ah, yes, sorry for being unclear: the search for a match goes backward,
but the matching itself goes forward.

The docstring of re-search-backward is more clear about that:

   The match found is the one starting last in the buffer
   and yet ending before the origin of the search.

> If I'm the only one who considers this behaviour broken (by design?[1]),

It's not the ideal behavior, admittedly.  It's even more obvious in
`looking-back'.  But fixing it would require the implementation of
a backward regexp matcher.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-07 13:02     ` Stefan Monnier
@ 2011-10-07 13:19       ` Štěpán Němec
  2011-10-11  2:03       ` Kenichi Handa
  1 sibling, 0 replies; 10+ messages in thread
From: Štěpán Němec @ 2011-10-07 13:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 9681

On Fri, Oct 07, 2011 at 09:02:18AM -0400, Stefan Monnier wrote:
> >> re-search-* stops at the first character position that has a match.
> >> And then it chooses the longest match at that position.
> > Thanks, but I'm not sure I understand what you mean here. Naturally, the
> > longest match for `re-search-backward' should be backward, not forward,
> 
> Ah, yes, sorry for being unclear: the search for a match goes backward,
> but the matching itself goes forward.
> 
> The docstring of re-search-backward is more clear about that:
> 
>    The match found is the one starting last in the buffer
>    and yet ending before the origin of the search.

I suppose that is more clear if you already know the behaviour, but I
didn't understand it that way, either. I think it should at least add
that the match is still forward, not backward, and that it might not
behave as expected for regexps containing constructs like * and +.
 
> > If I'm the only one who considers this behaviour broken (by design?[1]),
> 
> It's not the ideal behavior, admittedly.  It's even more obvious in
> `looking-back'.  But fixing it would require the implementation of
> a backward regexp matcher.

Yeah, as I said above (and as is obvious in the message quoted in the
bug report), the set of regexps usable with `re-search-backward' seems
to be quite limited, and one has to be very careful when using it (and
even some developers apparently fail at that).

So, again: it definitely needs better documentation, and IMO it also
needs fixing.

-- 
Štěpán





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-07 13:02     ` Stefan Monnier
  2011-10-07 13:19       ` Štěpán Němec
@ 2011-10-11  2:03       ` Kenichi Handa
  2011-10-11  3:56         ` Stefan Monnier
  1 sibling, 1 reply; 10+ messages in thread
From: Kenichi Handa @ 2011-10-11  2:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 9681, stepnem

In article <jwvr52pj706.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> It's not the ideal behavior, admittedly.  It's even more obvious in
> `looking-back'.  But fixing it would require the implementation of
> a backward regexp matcher.

FYI, in Mule (the version before integrating into Emacs), we
implemented such a feature by doing these:

  o Regular expression compiler written in Elisp which
    generates both forward matching and backward matching
    compiled patterns.

  o Modify regex.c to accept the above patterns and do
    backward matching if necessary.

I vaguely remember that I discussed this feature with RMS
when we were going to integrate Mule's multilingual feature
into Emacs, and it was rejected because it's not related to
multilingual feature.  And actually, that feature had been
used very rarely.

---
Kenichi Handa
handa@m17n.org





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-11  2:03       ` Kenichi Handa
@ 2011-10-11  3:56         ` Stefan Monnier
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Monnier @ 2011-10-11  3:56 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 9681, stepnem

>> It's not the ideal behavior, admittedly.  It's even more obvious in
>> `looking-back'.  But fixing it would require the implementation of
>> a backward regexp matcher.
> FYI, in Mule (the version before integrating into Emacs), we
> implemented such a feature by doing these:

I actually think it would be nice to have such a thing, but I also think
it'd be more important to have a non-backtracking regexp matcher.

> multilingual feature.  And actually, that feature had been
> used very rarely.

Indeed, it's not often needed, but those few cases can be significant.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2011-10-06  9:13 bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character) Štěpán Němec
  2011-10-06 12:57 ` Stefan Monnier
  2011-10-06 19:46 ` Johan Bockgård
@ 2012-03-16 15:49 ` Jack Duthen
  2012-03-16 16:29   ` bug#11025: Fwd: " Jack Duthen
  2 siblings, 1 reply; 10+ messages in thread
From: Jack Duthen @ 2012-03-16 15:49 UTC (permalink / raw)
  To: 9681

On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.

Stepan wrote:
> So, again: it definitely needs better documentation,
> and IMO it also needs fixing.

Hi!

For my own imenu-prev-index-position-function, I needed
a backward regexp search which would match something like ".+"
the way one (like Stepan) can expect rather than the way it actually
does (as described by Stefan).

So, I just wrote a function to do that.

The way it handles the COUNT variable is not as good as one could want
but, as I almost never use it, I don't care.
It's not very efficient but, since I can't notice the time it takes
when used in the "*rescan" menu and since I can't imagine a better algorithm,
it's ok for me.

(defun jd-re-search-backward (regexp &optional bound noerror count)
  (let ((orig-point (point)) bom)
    (when (re-search-backward regexp bound noerror count)
      (setq bom (point)) ; should not be useful
      (goto-char (point-min))
      (while (re-search-forward regexp orig-point 'noerror)
        ;; remember the last beginning of match
        (setq bom (match-beginning 0)))
      (goto-char bom)
      ;; set match data (erased by the last failing search) and return T
      (looking-at regexp))))

HTH
)jack(





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#11025: Fwd: bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character)
  2012-03-16 15:49 ` Jack Duthen
@ 2012-03-16 16:29   ` Jack Duthen
  0 siblings, 0 replies; 10+ messages in thread
From: Jack Duthen @ 2012-03-16 16:29 UTC (permalink / raw)
  To: 11025

Oops!  It's a feature, not a bug...
Sorry for the noise.

I just found the same problem addressed in bug#9681.
I understood the current behavior of re-search-backward.
see my reply.
sincerely
)jack(

---------- Forwarded message ----------
From: Jack Duthen <duthen.mac.01@gmail.com>
Date: 2012/3/16
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+
matching only a single character)
To: bug-gnu-emacs@gnu.org

On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.

Stepan wrote:
> So, again: it definitely needs better documentation,
> and IMO it also needs fixing.

Hi!

For my own imenu-prev-index-position-function, I needed
a backward regexp search which would match something like ".+"
the way one (like Stepan) can expect rather than the way it actually
does (as described by Stefan).

So, I just wrote a function to do that.

The way it handles the COUNT variable is not as good as one could want
but, as I almost never use it, I don't care.
It's not very efficient but, since I can't notice the time it takes
when used in the "*rescan" menu and since I can't imagine a better algorithm,
it's ok for me.

(defun jd-re-search-backward (regexp &optional bound noerror count)
 (let ((orig-point (point)) bom)
   (when (re-search-backward regexp bound noerror count)
     (setq bom (point)) ; should not be useful
     (goto-char (point-min))
     (while (re-search-forward regexp orig-point 'noerror)
       ;; remember the last beginning of match
       (setq bom (match-beginning 0)))
     (goto-char bom)
     ;; set match data (erased by the last failing search) and return T
     (looking-at regexp))))

HTH
)jack(





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-16 16:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-06  9:13 bug#9681: Broken behaviour of re-search-backward (.+ matching only a single character) Štěpán Němec
2011-10-06 12:57 ` Stefan Monnier
2011-10-06 18:48   ` Štěpán Němec
2011-10-07 13:02     ` Stefan Monnier
2011-10-07 13:19       ` Štěpán Němec
2011-10-11  2:03       ` Kenichi Handa
2011-10-11  3:56         ` Stefan Monnier
2011-10-06 19:46 ` Johan Bockgård
2012-03-16 15:49 ` Jack Duthen
2012-03-16 16:29   ` bug#11025: Fwd: " Jack Duthen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).