unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13802: stack overflow in mm-add-meta-html-tag
@ 2013-02-24  9:17 Thien-Thi Nguyen
  2013-02-25  0:20 ` Juri Linkov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Thien-Thi Nguyen @ 2013-02-24  9:17 UTC (permalink / raw)
  To: 13802

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

I see a "Stack overflow in regexp matcher" error traceable back to
lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:

  (re-search-forward "\
  <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
  text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

To allow the user (not me) to continue, i kludged the form to be:

  (ignore-errors
    (re-search-forward "..." nil t))

that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?

One idea (untested) is to replace the ".+" (used to match the charset)
with a more specific pattern.  Perhaps "[^<>]+" or "\\sw+"?

Thinking more systematically, maybe Emacs should add a condition
‘stack-overflow/regexp’ (or something like that) such that code can
‘condition-case’ for it and try a fallback path.

-- 
Thien-Thi Nguyen ..................................... GPG key: 4C807502
.                  NB: ttn at glug dot org is not me                   .
.                 (and has not been since 2007 or so)                  .
.                        ACCEPT NO SUBSTITUTES                         .
........... please send technical questions to mailing lists ...........

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2013-02-24  9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
@ 2013-02-25  0:20 ` Juri Linkov
  2014-01-31  0:38   ` Lars Ingebrigtsen
  2013-02-25  2:04 ` Stefan Monnier
  2013-07-06 16:11 ` Lars Ingebrigtsen
  2 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2013-02-25  0:20 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: 13802

> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>
> To allow the user (not me) to continue, i kludged the form to be:
>
>   (ignore-errors
>     (re-search-forward "..." nil t))
>
> that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?

`sgml-html-meta-auto-coding-function' uses a similar regexp
that doesn't fail with stack overflow.  You could get some ideas
from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2013-02-24  9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
  2013-02-25  0:20 ` Juri Linkov
@ 2013-02-25  2:04 ` Stefan Monnier
  2013-07-06 16:11 ` Lars Ingebrigtsen
  2 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2013-02-25  2:04 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: 13802

> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:

>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.

> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern.  Perhaps "[^<>]+" or "\\sw+"?

I don't think that would help.  To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.

> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.

In reality, such overflow should only ever happen if you have backrefs
in your regexp.


        Stefan





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2013-02-24  9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
  2013-02-25  0:20 ` Juri Linkov
  2013-02-25  2:04 ` Stefan Monnier
@ 2013-07-06 16:11 ` Lars Ingebrigtsen
  2 siblings, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2013-07-06 16:11 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: 13802

Thien-Thi Nguyen <ttn@gnuvola.org> writes:

> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>
>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

Do you know what text it is that triggers this bug?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2013-02-25  0:20 ` Juri Linkov
@ 2014-01-31  0:38   ` Lars Ingebrigtsen
  2014-01-31  6:10     ` Thien-Thi Nguyen
  2016-03-01  5:58     ` Lars Ingebrigtsen
  0 siblings, 2 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2014-01-31  0:38 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Thien-Thi Nguyen, 13802

Juri Linkov <juri@jurta.org> writes:

>> I see a "Stack overflow in regexp matcher" error traceable back to
>> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:
>>
>>   (re-search-forward "\
>>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)
>>
>> To allow the user (not me) to continue, i kludged the form to be:
>>
>>   (ignore-errors
>>     (re-search-forward "..." nil t))
>>
>> that is, wrapping w/ ‘ignore-errors’.  Is there a better solution?
>
> `sgml-html-meta-auto-coding-function' uses a similar regexp
> that doesn't fail with stack overflow.  You could get some ideas
> from this regexp and sync the regexp in `mm-add-meta-html-tag' with it.

I've adapted the regexp from that function in the patch below, but since
I don't have a test case, I'm not really sure about committing it.

Thien-Thi, could you post the message that triggers this error, or the
relevant bits of it?

diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index 17c8fb1..eaf9de4 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1405,9 +1405,7 @@ Return t if meta tag is added or replaced."
 <meta http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">" charset))
       (let ((case-fold-search t))
 	(goto-char (point-min))
-	(if (re-search-forward "\
-<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
-text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
+	(if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
 	    (if (and (not force-charset)
 		     (match-beginning 2)
 		     (string-match "\\`html\\'" (match-string 1)))


-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2014-01-31  0:38   ` Lars Ingebrigtsen
@ 2014-01-31  6:10     ` Thien-Thi Nguyen
  2016-03-01  5:58     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 7+ messages in thread
From: Thien-Thi Nguyen @ 2014-01-31  6:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 13802

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

() Lars Ingebrigtsen <larsi@gnus.org>
() Thu, 30 Jan 2014 16:38:49 -0800

   Juri Linkov <juri@jurta.org> writes:

   > `sgml-html-meta-auto-coding-function' uses a similar regexp that
   > doesn't fail with stack overflow.  You could get some ideas from
   > this regexp and sync the regexp in `mm-add-meta-html-tag' with it.

   I've adapted the regexp from that function in the patch below, but
   since I don't have a test case, I'm not really sure about committing
   it.

   Thien-Thi, could you post the message that triggers this error, or
   the relevant bits of it?

I'd like to, but no longer have immediate access to that particular
message -- it might take a day or two to excavate (if at all).  However,
i do remember it was all on one line (no newlines, machine generated).

   diff [...]
   - ORIGINAL-HAIRY-REGEXP
   + ANOTER-HAIRY-REGEXP

Maybe this would be a good time to substitute a symbolic regexp?

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#13802: stack overflow in mm-add-meta-html-tag
  2014-01-31  0:38   ` Lars Ingebrigtsen
  2014-01-31  6:10     ` Thien-Thi Nguyen
@ 2016-03-01  5:58     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2016-03-01  5:58 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 13802, Thien-Thi Nguyen

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I've adapted the regexp from that function in the patch below, but since
> I don't have a test case, I'm not really sure about committing it.
>
> Thien-Thi, could you post the message that triggers this error, or the
> relevant bits of it?

[...]

> -	(if (re-search-forward "\
> -<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
> -text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\([^\"'>]+\\)\\)?[^>]*>" nil t)
> +	(if (re-search-forward "<meta\\s-+\\http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']text/\\(\\sw+\\)\\(?:;\\s-*?charset=[\"']?\\(.+?\\)\\)[\"'\\s-/>]" nil t)
>  	    (if (and (not force-charset)

Since we have no test case for this, and I haven't seen any other
reports in this area, I'm not applying my patch, and I'm closing this
report.  If you see this again, please reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-01  5:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-24  9:17 bug#13802: stack overflow in mm-add-meta-html-tag Thien-Thi Nguyen
2013-02-25  0:20 ` Juri Linkov
2014-01-31  0:38   ` Lars Ingebrigtsen
2014-01-31  6:10     ` Thien-Thi Nguyen
2016-03-01  5:58     ` Lars Ingebrigtsen
2013-02-25  2:04 ` Stefan Monnier
2013-07-06 16:11 ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).