From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Thien-Thi Nguyen <ttn@gnuvola.org>
Cc: 13802@debbugs.gnu.org
Subject: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Sun, 24 Feb 2013 21:04:21 -0500 [thread overview]
Message-ID: <jwv4nh1dv13.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <87wqtyrry6.fsf@zigzag.favinet> (Thien-Thi Nguyen's message of "Sun, 24 Feb 2013 10:17:53 +0100")
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.
> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
I don't think that would help. To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.
> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.
In reality, such overflow should only ever happen if you have backrefs
in your regexp.
Stefan
next prev parent reply other threads:[~2013-02-25 2:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-24 9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
2013-02-25 0:20 ` Juri Linkov
2014-01-31 0:38 ` Lars Ingebrigtsen
2014-01-31 6:10 ` Thien-Thi Nguyen
2016-03-01 5:58 ` Lars Ingebrigtsen
2013-02-25 2:04 ` Stefan Monnier [this message]
2013-07-06 16:11 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwv4nh1dv13.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=13802@debbugs.gnu.org \
--cc=ttn@gnuvola.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).