all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Andreas Schwab <schwab@linux-m68k.org>
Cc: Alan Mackenzie <acm@muc.de>, 18577@debbugs.gnu.org
Subject: bug#18577: Regexp I-search: [(error Stack overflow in regexp matcher)]
Date: Sun, 28 Sep 2014 13:35:30 -0400	[thread overview]
Message-ID: <jwv38bbsme9.fsf-monnier+emacsbugs@gnu.org> (raw)
In-Reply-To: <87d2afud0p.fsf@igel.home> (Andreas Schwab's message of "Sun, 28 Sep 2014 14:48:22 +0200")

>> Is this a defect in my regexp or in the regexp engine?
> It is fundamental to the way regexp matching works.

To clarify: it is fundamental to the way *our* regexp engine works.

As long as the regexp doesn't use backrefs, it can be matched
efficiently, without backtracking.  Of course using \(..\) (as opposed
to using \(?:..\)) can also make the problem harder since the various
different (but largely equivalent) ways to match might need to be
distinguishable via match-data.

But even tho your regexp doesn't use backrefs, and even if you replace
all \(..\) with \(?:..\), your regexp will still cause problems because
our regexp engine does not try to optimize these kinds of cases.

So you have to do it by hand.

>> If the former, how could I rewrite the regexp so that it would not hit
>> these problems?

Maybe something like:

/\*\(<insidecomment>\)*\*+/

where <insidecomment> is something like

   [^'*]\|\*+\([^/'*]\|'<afterquote>\)\|'<afterquote>

where <afterquote> is something like

   \([^'*]\|\*+[^/'*]\)*'

Tho this will still push a backtrack point for every character.
Maybe better would be something like

/\*[^'*]*\(<insidecomment>\)*\*+/

where <insidecomment> is something like

   \(\*+[^/'*]\|\**'<afterquote>\)[^'*]*

where <afterquote> is still something like

   \([^'*]\|\*+[^/'*]\)*'

so that we should only push a backtrace point when we see a * or a ' in
the comment.


        Stefan





  reply	other threads:[~2014-09-28 17:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-28  8:55 bug#18577: Regexp I-search: [(error Stack overflow in regexp matcher)] Alan Mackenzie
2014-09-28 10:56 ` Andreas Schwab
2014-09-28 12:37   ` Alan Mackenzie
2014-09-28 12:48     ` Andreas Schwab
2014-09-28 17:35       ` Stefan Monnier [this message]
2014-11-27  8:44         ` Tassilo Horn
2021-10-23  2:47         ` Stefan Kangas
2021-10-23  7:32           ` Eli Zaretskii
2021-10-23  8:30             ` Stefan Kangas
2021-10-23  8:39               ` Eli Zaretskii
2021-10-23  9:32                 ` Stefan Kangas
2021-10-23 11:27                   ` Eli Zaretskii
2021-10-24 22:08                     ` Stefan Kangas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwv38bbsme9.fsf-monnier+emacsbugs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=18577@debbugs.gnu.org \
    --cc=acm@muc.de \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.