* bug#13802: stack overflow in mm-add-meta-html-tag
@ 2013-02-24 9:17 Thien-Thi Nguyen
2013-02-25 0:20 ` Juri Linkov
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Thien-Thi Nguyen @ 2013-02-24 9:17 UTC (permalink / raw)
To: 13802
[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]
I see a "Stack overflow in regexp matcher" error traceable back to
lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
(re-search-forward "\
<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
To allow the user (not me) to continue, i kludged the form to be:
(ignore-errors
(re-search-forward "..." nil t))
that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
One idea (untested) is to replace the ".+" (used to match the charset)
with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
Thinking more systematically, maybe Emacs should add a condition
‘stack-overflow/regexp’ (or something like that) such that code can
‘condition-case’ for it and try a fallback path.
--
Thien-Thi Nguyen ..................................... GPG key: 4C807502
. NB: ttn at glug dot org is not me .
. (and has not been since 2007 or so) .
. ACCEPT NO SUBSTITUTES .
........... please send technical questions to mailing lists ...........
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2013-02-24 9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
@ 2013-02-25 0:20 ` Juri Linkov
2014-01-31 0:38 ` Lars Ingebrigtsen
2013-02-25 2:04 ` Stefan Monnier
2013-07-06 16:11 ` Lars Ingebrigtsen
2 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2013-02-25 0:20 UTC (permalink / raw)
To: Thien-Thi Nguyen; +Cc: 13802
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>
> To allow the user (not me) to continue, i kludged the form to be:
>
> (ignore-errors
> (re-search-forward "..." nil t))
>
> that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
`sgml-html-meta-auto-coding-function' uses a similar regexp
that doesn't fail with stack overflow. You could get some ideas
from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2013-02-24 9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
2013-02-25 0:20 ` Juri Linkov
@ 2013-02-25 2:04 ` Stefan Monnier
2013-07-06 16:11 ` Lars Ingebrigtsen
2 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2013-02-25 2:04 UTC (permalink / raw)
To: Thien-Thi Nguyen; +Cc: 13802
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.
> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern. Perhaps "[^<>]+" or "\\sw+"?
I don't think that would help. To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.
> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.
In reality, such overflow should only ever happen if you have backrefs
in your regexp.
Stefan
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2013-02-24 9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
2013-02-25 0:20 ` Juri Linkov
2013-02-25 2:04 ` Stefan Monnier
@ 2013-07-06 16:11 ` Lars Ingebrigtsen
2 siblings, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2013-07-06 16:11 UTC (permalink / raw)
To: Thien-Thi Nguyen; +Cc: 13802
Thien-Thi Nguyen <ttn@gnuvola.org> writes:
> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
> (re-search-forward "\
> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
Do you know what text it is that triggers this bug?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2013-02-25 0:20 ` Juri Linkov
@ 2014-01-31 0:38 ` Lars Ingebrigtsen
2014-01-31 6:10 ` Thien-Thi Nguyen
2016-03-01 5:58 ` Lars Ingebrigtsen
0 siblings, 2 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2014-01-31 0:38 UTC (permalink / raw)
To: Juri Linkov; +Cc: Thien-Thi Nguyen, 13802
Juri Linkov <juri@jurta.org> writes:
>> I see a "Stack overflow in regexp matcher" error traceable back to
>> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>>
>> (re-search-forward "\
>> <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>> text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>>
>> To allow the user (not me) to continue, i kludged the form to be:
>>
>> (ignore-errors
>> (re-search-forward "..." nil t))
>>
>> that is, wrapping w/ ‘ignore-errors’. Is there a better solution?
>
> `sgml-html-meta-auto-coding-function' uses a similar regexp
> that doesn't fail with stack overflow. You could get some ideas
> from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
I've adapted the regexp from that function in the patch below, but since
I don't have a test case, I'm not really sure about committing it.
Thien-Thi, could you post the message that triggers this error, or the
relevant bits of it?
diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index 17c8fb1..eaf9de4 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1405,9 +1405,7 @@ Return t if meta tag is added or replaced."
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">" charset))
(let ((case-fold-search t))
(goto-char (point-min))
- (if (re-search-forward "\
-<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
-text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
+ (if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
(if (and (not force-charset)
(match-beginning 2)
(string-match "\\`html\\'" (match-string 1)))
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2014-01-31 0:38 ` Lars Ingebrigtsen
@ 2014-01-31 6:10 ` Thien-Thi Nguyen
2016-03-01 5:58 ` Lars Ingebrigtsen
1 sibling, 0 replies; 7+ messages in thread
From: Thien-Thi Nguyen @ 2014-01-31 6:10 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 13802
[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]
() Lars Ingebrigtsen <larsi@gnus.org>
() Thu, 30 Jan 2014 16:38:49 -0800
Juri Linkov <juri@jurta.org> writes:
> `sgml-html-meta-auto-coding-function' uses a similar regexp that
> doesn't fail with stack overflow. You could get some ideas from
> this regexp and sync the regexp in `mm-add-meta-html-tag' with it.
I've adapted the regexp from that function in the patch below, but
since I don't have a test case, I'm not really sure about committing
it.
Thien-Thi, could you post the message that triggers this error, or
the relevant bits of it?
I'd like to, but no longer have immediate access to that particular
message -- it might take a day or two to excavate (if at all). However,
i do remember it was all on one line (no newlines, machine generated).
diff [...]
- ORIGINAL-HAIRY-REGEXP
+ ANOTER-HAIRY-REGEXP
Maybe this would be a good time to substitute a symbolic regexp?
--
Thien-Thi Nguyen
GPG key: 4C807502
(if you're human and you know it)
read my lisp: (responsep (questions 'technical)
(not (via 'mailing-list)))
=> nil
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#13802: stack overflow in mm-add-meta-html-tag
2014-01-31 0:38 ` Lars Ingebrigtsen
2014-01-31 6:10 ` Thien-Thi Nguyen
@ 2016-03-01 5:58 ` Lars Ingebrigtsen
1 sibling, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01 5:58 UTC (permalink / raw)
To: Juri Linkov; +Cc: 13802, Thien-Thi Nguyen
Lars Ingebrigtsen <larsi@gnus.org> writes:
> I've adapted the regexp from that function in the patch below, but since
> I don't have a test case, I'm not really sure about committing it.
>
> Thien-Thi, could you post the message that triggers this error, or the
> relevant bits of it?
[...]
> - (if (re-search-forward "\
> -<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> -text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
> + (if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
> (if (and (not force-charset)
Since we have no test case for this, and I haven't seen any other
reports in this area, I'm not applying my patch, and I'm closing this
report. If you see this again, please reopen.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-01 5:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-24 9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
2013-02-25 0:20 ` Juri Linkov
2014-01-31 0:38 ` Lars Ingebrigtsen
2014-01-31 6:10 ` Thien-Thi Nguyen
2016-03-01 5:58 ` Lars Ingebrigtsen
2013-02-25 2:04 ` Stefan Monnier
2013-07-06 16:11 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.