From: Gregory Heytings <gregory@heytings.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Eli Zaretskii <eliz@gnu.org>, 61514@debbugs.gnu.org, mah@everybody.org
Subject: bug#61514: 30.0.50; sadistically long xml line hangs emacs
Date: Mon, 20 Feb 2023 11:28:44 +0000 [thread overview]
Message-ID: <a32486a84ddfd2f1989c@heytings.org> (raw)
In-Reply-To: <jwv5ybxou1d.fsf-monnier+emacs@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]
>> Looking at the history of that variable, which is in fact a
>> compile-time constant, I see that it was initially (May 1995) set to
>> 200000. A few months later (Nov 1995) it was set to 20000, and reduced
>> again (apparently because of bug reports) to 8000 and to 4000 (both in
>> Jun 1996). Two months later it was again set to 20000 (Aug 1996), and
>> a year later to 40000 (Dec 1997). It kept that value since then. As
>> these changes (and this bug report) demonstrate, it is not possible to
>> give that variable a "one size fits all" value.
>
> Note that the stack is allocated with `SAFE_ALLOCA` and used to be
> allocated with just `alloca`. So the constant was probably reduced
> (back in the 90s) in response to reports of segfaults due to C stack
> overflows.
>
Indeed. But now that we use SAFE_ALLOCA, we fallback to malloc when there
is not enough room for an alloca, so the constant seems even more
arbitrary.
>
> Nowadays we should be hopefully(?) safe from such segfaults since
> `SAFE_ALLOCA` only uses `alloca` for smallish allocations.
>
That's not the case in regex-emacs.c: REGEX_USE_SAFE_ALLOCA sets sa_avail
to emacs_re_safe_alloca (~6 MiB) instead of its default MAX_ALLOCA value
(16 KiB).
>
> This really needs a comment (at least one referring to this bug report).
> I think the idea is that we hope the regexp will need at most one stack
> entry per character, so the above means that we're willing to limit the
> regexp search to about 1kB of text, which sounds fair given it's
> supposed to match just a single XML attribute.
>
Indeed, thanks!
>> + DEFVAR_INT ("regexp-max-failures", Vregexp_max_failures,
>> + doc: /* Maximum number of failures points in a regexp search. */);
>> + Vregexp_max_failures = max_regexp_max_failures;
>
> This name is misleading. It suggests it's talking about how many times
> we fail, whereas the reality is that it's about the number of pending
> branches in the search space (which the source code calls "failure
> points" because it's info to be used in case the current branch fails to
> match). It could also be described as the number of "pending
> continuations" or "stacked failure continuations" or some wording like
> that.
>
> But for the var name itself, how 'bout `regexp-max-backtracking-depth`?
>
Indeed again, and thanks again!
Updated patch attached.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Make-the-backtracking-depth-of-regexp-searches-modif.patch --]
[-- Type: text/x-diff; name=Make-the-backtracking-depth-of-regexp-searches-modif.patch, Size: 7625 bytes --]
From 4523387ac645d8d6daab07114e29d9386a02450a Mon Sep 17 00:00:00 2001
From: Gregory Heytings <gregory@heytings.org>
Date: Mon, 20 Feb 2023 11:18:30 +0000
Subject: [PATCH] Make the backtracking depth of regexp searches modifiable
* src/search.c (syms_of_search) <regexp-max-backtracking-depth>:
New variable, replacing the constant variable
'emacs_re_max_failures'. Initialize it with the constant
'max_regexp_max_backtracking_depth'.
* src/regex-emacs.h: Replace the external definition of
'emacs_re_max_failures' with the constant
'max_regexp_max_backtracking_depth'.
* src/regex-emacs.c (GROW_FAIL_STACK): Use the new variable
instead of the constant. Reset it to its maximum value when
necessary.
* src/emacs.c (main): Use the new constant
'max_regexp_max_backtracking_depth' in the calculations.
* lisp/nxml/xmltok.el (xmltok-scan-attributes): Bind
'regexp-max-backtracking-depth' to a small value, and add a
comment. Fixes bug#61514.
---
lisp/nxml/xmltok.el | 7 ++++++-
src/emacs.c | 8 ++++----
src/regex-emacs.c | 26 +++++++++++---------------
src/regex-emacs.h | 7 +++++--
src/search.c | 13 +++++++++++++
5 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/lisp/nxml/xmltok.el b/lisp/nxml/xmltok.el
index c36d225c7c9..8201b773e0f 100644
--- a/lisp/nxml/xmltok.el
+++ b/lisp/nxml/xmltok.el
@@ -731,7 +731,12 @@ xmltok-scan-after-comment-open
(defun xmltok-scan-attributes ()
(let ((recovering nil)
- (atts-needing-normalization nil))
+ (atts-needing-normalization nil)
+ ;; Limit the backtracking depth of regexp searches, to fail
+ ;; with a "Stack overflow in regexp matcher" error instead of
+ ;; inflooping in looking-at. This assumes that XML attributes
+ ;; do not use more than about 1 KB characters. See bug#61514.
+ (regexp-max-backtracking-depth 1000))
(while (cond ((or (looking-at (xmltok-attribute regexp))
;; use non-greedy group
(when (looking-at (concat "[^<>\n]+?"
diff --git a/src/emacs.c b/src/emacs.c
index 214e2e2a296..d0dca3f03ec 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1499,9 +1499,9 @@ main (int argc, char **argv)
rlim_t lim = rlim.rlim_cur;
/* Approximate the amount regex-emacs.c needs per unit of
- emacs_re_max_failures, then add 33% to cover the size of the
- smaller stacks that regex-emacs.c successively allocates and
- discards on its way to the maximum. */
+ max_regexp_max_backtracking_depth, then add 33% to cover the
+ size of the smaller stacks that regex-emacs.c successively
+ allocates and discards on its way to the maximum. */
int min_ratio = 20 * sizeof (char *);
int ratio = min_ratio + min_ratio / 3;
@@ -1514,7 +1514,7 @@ main (int argc, char **argv)
if (try_to_grow_stack)
{
- rlim_t newlim = emacs_re_max_failures * ratio + extra;
+ rlim_t newlim = max_regexp_max_backtracking_depth * ratio + extra;
/* Round the new limit to a page boundary; this is needed
for Darwin kernel 15.4.0 (see Bug#23622) and perhaps
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 2dca0d16ad9..931db980e39 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -868,17 +868,6 @@ print_double_string (re_char *where, re_char *string1, ptrdiff_t size1,
space, so it is not a hard limit. */
#define INIT_FAILURE_ALLOC 20
-/* Roughly the maximum number of failure points on the stack. Would be
- exactly that if failure always used TYPICAL_FAILURE_SIZE items.
- This is a variable only so users of regex can assign to it; we never
- change it ourselves. We always multiply it by TYPICAL_FAILURE_SIZE
- before using it, so it should probably be a byte-count instead. */
-/* Note that 4400 was enough to cause a crash on Alpha OSF/1,
- whose default stack limit is 2mb. In order for a larger
- value to work reliably, you have to try to make it accord
- with the process stack limit. */
-ptrdiff_t emacs_re_max_failures = 40000;
-
union fail_stack_elt
{
re_char *pointer;
@@ -912,7 +901,7 @@ #define INIT_FAIL_STACK() \
/* Double the size of FAIL_STACK, up to a limit
- which allows approximately 'emacs_re_max_failures' items.
+ which allows approximately 'Vregexp_max_backtracking_depth' items.
Return 1 if succeeds, and 0 if either ran out of memory
allocating space for it or it was already too large.
@@ -926,16 +915,23 @@ #define INIT_FAIL_STACK() \
#define FAIL_STACK_GROWTH_FACTOR 4
#define GROW_FAIL_STACK(fail_stack) \
- (((fail_stack).size >= emacs_re_max_failures * TYPICAL_FAILURE_SIZE) \
+ ((Vregexp_max_backtracking_depth = \
+ Vregexp_max_backtracking_depth <= 0 \
+ || Vregexp_max_backtracking_depth \
+ > max_regexp_max_backtracking_depth \
+ ? max_regexp_max_backtracking_depth \
+ : Vregexp_max_backtracking_depth), \
+ ((fail_stack).size \
+ >= Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE) \
? 0 \
: ((fail_stack).stack \
= REGEX_REALLOCATE ((fail_stack).stack, \
(fail_stack).avail * sizeof (fail_stack_elt_t), \
- min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE, \
+ min (Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE, \
((fail_stack).size * FAIL_STACK_GROWTH_FACTOR)) \
* sizeof (fail_stack_elt_t)), \
((fail_stack).size \
- = (min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE, \
+ = (min (Vregexp_max_backtracking_depth * TYPICAL_FAILURE_SIZE, \
((fail_stack).size * FAIL_STACK_GROWTH_FACTOR)))), \
1))
diff --git a/src/regex-emacs.h b/src/regex-emacs.h
index 1bc973363e9..9ccc4177487 100644
--- a/src/regex-emacs.h
+++ b/src/regex-emacs.h
@@ -49,8 +49,11 @@ #define EMACS_REGEX_H 1
TODO: turn into an actual function parameter. */
extern Lisp_Object re_match_object;
-/* Roughly the maximum number of failure points on the stack. */
-extern ptrdiff_t emacs_re_max_failures;
+/* Maximum value for Vregexp_max_backtracking_depth. This is roughly
+ the maximum allowed number of failure points on the stack. It
+ would be exactly that if failure always used TYPICAL_FAILURE_SIZE
+ items. */
+#define max_regexp_max_backtracking_depth 40000
/* Amount of memory that we can safely stack allocate. */
extern ptrdiff_t emacs_re_safe_alloca;
diff --git a/src/search.c b/src/search.c
index 0bb52c03eef..fc5d7c2b8e2 100644
--- a/src/search.c
+++ b/src/search.c
@@ -3431,6 +3431,19 @@ syms_of_search (void)
is to bind it with `let' around a small expression. */);
Vinhibit_changing_match_data = Qnil;
+ DEFVAR_INT ("regexp-max-backtracking-depth", Vregexp_max_backtracking_depth,
+ doc: /* Maximum backtracking depth in a regexp search.
+
+When the number of pending branches in the search space reaches that
+threshold, a regexp search fails with a "Stack overflow in regexp
+matcher". Roughly speaking, this is the number of characters to which
+a regexp search is limited, with a complex enough regexp.
+
+Note that this variable will be reset to its default value if it is
+set to a non-positive value, or to a higher value than its default
+value. */);
+ Vregexp_max_backtracking_depth = max_regexp_max_backtracking_depth;
+
defsubr (&Slooking_at);
defsubr (&Sposix_looking_at);
defsubr (&Sstring_match);
--
2.39.0
next prev parent reply other threads:[~2023-02-20 11:28 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-14 21:02 bug#61514: 30.0.50; sadistically long xml line hangs emacs Mark A. Hershberger via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 22:05 ` Gregory Heytings
2023-02-15 1:04 ` Mark A. Hershberger
2023-02-15 8:39 ` Gregory Heytings
2023-02-15 10:24 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:41 ` Gregory Heytings
2023-02-15 10:52 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 10:59 ` Gregory Heytings
2023-02-15 11:52 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 12:11 ` Gregory Heytings
2023-02-15 12:54 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:31 ` Gregory Heytings
2023-02-15 13:56 ` Eli Zaretskii
2023-02-15 12:20 ` Dmitry Gutov
2023-02-15 13:58 ` Gregory Heytings
2023-02-15 14:17 ` Eli Zaretskii
2023-02-15 14:34 ` Gregory Heytings
2023-02-18 16:22 ` Eli Zaretskii
2023-02-18 17:06 ` Mark A. Hershberger
2023-02-18 17:58 ` Eli Zaretskii
2023-02-18 23:06 ` Gregory Heytings
2023-02-19 0:46 ` Gregory Heytings
2023-02-19 6:42 ` Eli Zaretskii
2023-02-19 23:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:48 ` Gregory Heytings
2023-02-19 23:58 ` Gregory Heytings
2023-02-20 2:05 ` Gregory Heytings
2023-02-20 4:24 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 11:28 ` Gregory Heytings [this message]
2023-02-20 12:33 ` Eli Zaretskii
2023-02-20 12:31 ` Eli Zaretskii
2023-02-20 12:40 ` Gregory Heytings
2023-02-20 13:14 ` Eli Zaretskii
2023-02-20 14:17 ` Gregory Heytings
2023-02-20 0:14 ` Gregory Heytings
2023-02-20 12:32 ` Eli Zaretskii
2023-02-19 23:48 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:19 ` Eli Zaretskii
2023-02-20 13:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 13:54 ` Eli Zaretskii
2023-02-20 14:59 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 15:56 ` Gregory Heytings
2023-02-20 16:47 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 17:14 ` Gregory Heytings
2023-02-20 17:34 ` Gregory Heytings
2023-02-20 18:49 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:11 ` Gregory Heytings
2023-02-20 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 19:37 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 20:13 ` Gregory Heytings
2023-02-21 12:05 ` Eli Zaretskii
2023-02-21 12:37 ` Gregory Heytings
2023-02-21 13:07 ` Eli Zaretskii
2023-02-21 14:38 ` Gregory Heytings
2023-02-21 14:48 ` Eli Zaretskii
2023-02-21 15:25 ` Gregory Heytings
2023-02-21 15:44 ` Gregory Heytings
2023-02-21 16:58 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-03-18 10:59 ` Gregory Heytings
2023-03-18 11:10 ` Eli Zaretskii
2023-03-18 15:06 ` Gregory Heytings
2023-03-19 2:39 ` mah via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:24 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 13:35 ` Gregory Heytings
2023-02-20 20:01 ` Eli Zaretskii
2023-02-21 2:23 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-21 9:39 ` Gregory Heytings
2023-02-21 12:44 ` Eli Zaretskii
2023-02-20 17:04 ` Gregory Heytings
2023-02-20 14:06 ` Gregory Heytings
2023-02-20 14:16 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 14:24 ` Gregory Heytings
2023-02-20 15:02 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 23:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-20 12:41 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a32486a84ddfd2f1989c@heytings.org \
--to=gregory@heytings.org \
--cc=61514@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=mah@everybody.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).