unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: npostavs@users.sourceforge.net
To: Eli Zaretskii <eliz@gnu.org>
Cc: 24751@debbugs.gnu.org
Subject: bug#24751: 26.0.50; Regex stack overflow not detected properly (gets "Variable binding depth exceeds max-specpdl-size")
Date: Mon, 02 Jan 2017 13:30:26 -0500	[thread overview]
Message-ID: <8737h171rx.fsf@users.sourceforge.net> (raw)
In-Reply-To: <83r34lfpsl.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 02 Jan 2017 17:24:26 +0200")

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: npostavs@users.sourceforge.net
>> Cc: 24751@debbugs.gnu.org
>> Date: Sun, 01 Jan 2017 23:49:46 -0500
>> 
>> Everything you've said makes sense after your last message, but I'm
>> still unable to put it together and come up with a revised comment.
>> Could you make a suggestion?
>
> How about the below?

Looks good, I've incorporated it into the patch:


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2151 bytes --]

From 188801f8e017f0702cbb24390e4f88b3d0da18ff Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sat, 5 Nov 2016 16:51:53 -0400
Subject: [PATCH v3 1/2] Fix computation of regex stack limit

The regex stack limit was being computed as the number of stack entries,
whereas it was being compared with the current size as measured in
bytes.  This could cause indefinite looping when nearing the stack limit
if re_max_failures happened not to be a multiple of sizeof
fail_stack_elt_t (Bug #24751).

* src/regex.c (GROW_FAIL_STACK): Compute both current stack size and
limit as numbers of stack entries.
---
 src/regex.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/regex.c b/src/regex.c
index ae3fde8..a2d2c52 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -1319,23 +1319,20 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
 #define FAIL_STACK_GROWTH_FACTOR 4
 
 #define GROW_FAIL_STACK(fail_stack)					\
-  (((fail_stack).size * sizeof (fail_stack_elt_t)			\
-    >= re_max_failures * TYPICAL_FAILURE_SIZE)				\
+  (((fail_stack).size >= re_max_failures * TYPICAL_FAILURE_SIZE)        \
    ? 0									\
    : ((fail_stack).stack						\
       = REGEX_REALLOCATE_STACK ((fail_stack).stack,			\
 	  (fail_stack).size * sizeof (fail_stack_elt_t),		\
-	  min (re_max_failures * TYPICAL_FAILURE_SIZE,			\
-	       ((fail_stack).size * sizeof (fail_stack_elt_t)		\
-		* FAIL_STACK_GROWTH_FACTOR))),				\
+          min (re_max_failures * TYPICAL_FAILURE_SIZE,                  \
+               ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))          \
+          * sizeof (fail_stack_elt_t)),                                 \
 									\
       (fail_stack).stack == NULL					\
       ? 0								\
       : ((fail_stack).size						\
-	 = (min (re_max_failures * TYPICAL_FAILURE_SIZE,		\
-		 ((fail_stack).size * sizeof (fail_stack_elt_t)		\
-		  * FAIL_STACK_GROWTH_FACTOR))				\
-	    / sizeof (fail_stack_elt_t)),				\
+         = (min (re_max_failures * TYPICAL_FAILURE_SIZE,                \
+                 ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))),      \
 	 1)))
 
 
-- 
2.9.3


[-- Attachment #3: patch --]
[-- Type: text/plain, Size: 10273 bytes --]

From 76656125620b442c4895c8460a0fe7c5298b3fa6 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Sun, 1 Jan 2017 14:09:13 -0500
Subject: [PATCH v3 2/2] Use expanded stack during regex matches

While the stack is increased in main(), to allow the regex stack
allocation to use alloca we also need to modify regex.c to actually take
advantage of the increased stack, and not limit stack allocations to
SAFE_ALLOCA bytes.

* src/regex.c (MATCH_MAY_ALLOCATE): Remove obsolete comment about
allocations in signal handlers which no longer happens and correct
description about when and why MATCH_MAY_ALLOCATE should be defined.
(emacs_re_safe_alloca): New variable.
(REGEX_USE_SAFE_ALLOCA): Use it as the limit of stack allocation instead
of MAX_ALLOCA.
(emacs_re_max_failures): Rename from `re_max_failures' to avoid
confusion with glibc's `re_max_failures'.
* src/emacs.c (main): Increase the amount of fixed 'extra' bytes we add
to the stack.  Instead of changing emacs_re_max_failures based on the
new stack size, just change emacs_re_safe_alloca; emacs_re_max_failures
remains constant regardless, since if we run out stack space SAFE_ALLOCA
will fall back to heap allocation.

Co-authored-by: Eli Zaretskii <eliz@gnu.org>
---
 src/emacs.c | 22 +++++++++++--------
 src/regex.c | 73 ++++++++++++++++++++++++++++++++++++-------------------------
 src/regex.h |  7 +++++-
 3 files changed, 62 insertions(+), 40 deletions(-)

diff --git a/src/emacs.c b/src/emacs.c
index ae29e9a..28b395c 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -831,14 +831,16 @@ main (int argc, char **argv)
       rlim_t lim = rlim.rlim_cur;
 
       /* Approximate the amount regex.c needs per unit of
-	 re_max_failures, then add 33% to cover the size of the
+	 emacs_re_max_failures, then add 33% to cover the size of the
 	 smaller stacks that regex.c successively allocates and
 	 discards on its way to the maximum.  */
-      int ratio = 20 * sizeof (char *);
-      ratio += ratio / 3;
+      int min_ratio = 20 * sizeof (char *);
+      int ratio = min_ratio + min_ratio / 3;
 
-      /* Extra space to cover what we're likely to use for other reasons.  */
-      int extra = 200000;
+      /* Extra space to cover what we're likely to use for other
+         reasons.  For example, a typical GC might take 30K stack
+         frames.  */
+      int extra = (30 * 1000) * 50;
 
       bool try_to_grow_stack = true;
 #ifndef CANNOT_DUMP
@@ -847,7 +849,7 @@ main (int argc, char **argv)
 
       if (try_to_grow_stack)
 	{
-	  rlim_t newlim = re_max_failures * ratio + extra;
+	  rlim_t newlim = emacs_re_max_failures * ratio + extra;
 
 	  /* Round the new limit to a page boundary; this is needed
 	     for Darwin kernel 15.4.0 (see Bug#23622) and perhaps
@@ -869,9 +871,11 @@ main (int argc, char **argv)
 		lim = newlim;
 	    }
 	}
-
-      /* Don't let regex.c overflow the stack.  */
-      re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio;
+      /* If the stack is big enough, let regex.c more of it before
+         falling back to heap allocation.  */
+      emacs_re_safe_alloca = max
+        (min (lim - extra, SIZE_MAX) * (min_ratio / ratio),
+         MAX_ALLOCA);
     }
 #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
 
diff --git a/src/regex.c b/src/regex.c
index a2d2c52..5ebcda1 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -430,9 +430,12 @@ init_syntax_once (void)
 \f
 /* Should we use malloc or alloca?  If REGEX_MALLOC is not defined, we
    use `alloca' instead of `malloc'.  This is because using malloc in
-   re_search* or re_match* could cause memory leaks when C-g is used in
-   Emacs; also, malloc is slower and causes storage fragmentation.  On
-   the other hand, malloc is more portable, and easier to debug.
+   re_search* or re_match* could cause memory leaks when C-g is used
+   in Emacs (note that SAFE_ALLOCA could also call malloc, but does so
+   via `record_xmalloc' which uses `unwind_protect' to ensure the
+   memory is freed even in case of non-local exits); also, malloc is
+   slower and causes storage fragmentation.  On the other hand, malloc
+   is more portable, and easier to debug.
 
    Because we sometimes use alloca, some routines have to be macros,
    not functions -- `alloca'-allocated space disappears at the end of the
@@ -447,7 +450,13 @@ init_syntax_once (void)
 #else /* not REGEX_MALLOC  */
 
 # ifdef emacs
-#  define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA
+/* This may be adjusted in main(), if the stack is successfully grown.  */
+ptrdiff_t emacs_re_safe_alloca = MAX_ALLOCA;
+/* Like USE_SAFE_ALLOCA, but use emacs_re_safe_alloca.  */
+#  define REGEX_USE_SAFE_ALLOCA                                        \
+  ptrdiff_t sa_avail = emacs_re_safe_alloca;                           \
+  ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false
+
 #  define REGEX_SAFE_FREE() SAFE_FREE ()
 #  define REGEX_ALLOCATE SAFE_ALLOCA
 # else
@@ -1195,24 +1204,28 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
     gettext_noop ("Range striding over charsets") /* REG_ERANGEX  */
   };
 \f
-/* Avoiding alloca during matching, to placate r_alloc.  */
-
-/* Define MATCH_MAY_ALLOCATE unless we need to make sure that the
-   searching and matching functions should not call alloca.  On some
-   systems, alloca is implemented in terms of malloc, and if we're
-   using the relocating allocator routines, then malloc could cause a
-   relocation, which might (if the strings being searched are in the
-   ralloc heap) shift the data out from underneath the regexp
-   routines.
-
-   Here's another reason to avoid allocation: Emacs
-   processes input from X in a signal handler; processing X input may
-   call malloc; if input arrives while a matching routine is calling
-   malloc, then we're scrod.  But Emacs can't just block input while
-   calling matching routines; then we don't notice interrupts when
-   they come in.  So, Emacs blocks input around all regexp calls
-   except the matching calls, which it leaves unprotected, in the
-   faith that they will not malloc.  */
+/* Whether to allocate memory during matching.  */
+
+/* Define MATCH_MAY_ALLOCATE to allow the searching and matching
+   functions allocate memory for the failure stack and registers.
+   Normally should be defined, because otherwise searching and
+   matching routines will have much smaller memory resources at their
+   disposal, and therefore might fail to handle complex regexps.
+   Therefore undefine MATCH_MAY_ALLOCATE only in the following
+   exceptional situations:
+
+   . When running on a system where memory is at premium.
+   . When alloca cannot be used at all, perhaps due to bugs in
+     its implementation, or its being unavailable, or due to a
+     very small stack size.  This requires to define REGEX_MALLOC
+     to use malloc instead, which in turn could lead to memory
+     leaks if search is interrupted by a signal.  (For these
+     reasons, defining REGEX_MALLOC when building Emacs
+     automatically undefines MATCH_MAY_ALLOCATE, but outside
+     Emacs you may not care about memory leaks.)  If you want to
+     prevent the memory leaks, undefine MATCH_MAY_ALLOCATE.
+   . When code that calls the searching and matching functions
+     cannot allow memory allocation, for whatever reasons.  */
 
 /* Normally, this is fine.  */
 #define MATCH_MAY_ALLOCATE
@@ -1249,9 +1262,9 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
    whose default stack limit is 2mb.  In order for a larger
    value to work reliably, you have to try to make it accord
    with the process stack limit.  */
-size_t re_max_failures = 40000;
+size_t emacs_re_max_failures = 40000;
 # else
-size_t re_max_failures = 4000;
+size_t emacs_re_max_failures = 4000;
 # endif
 
 union fail_stack_elt
@@ -1304,7 +1317,7 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
 
 
 /* Double the size of FAIL_STACK, up to a limit
-   which allows approximately `re_max_failures' items.
+   which allows approximately `emacs_re_max_failures' items.
 
    Return 1 if succeeds, and 0 if either ran out of memory
    allocating space for it or it was already too large.
@@ -1319,19 +1332,19 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
 #define FAIL_STACK_GROWTH_FACTOR 4
 
 #define GROW_FAIL_STACK(fail_stack)					\
-  (((fail_stack).size >= re_max_failures * TYPICAL_FAILURE_SIZE)        \
+  (((fail_stack).size >= emacs_re_max_failures * TYPICAL_FAILURE_SIZE)        \
    ? 0									\
    : ((fail_stack).stack						\
       = REGEX_REALLOCATE_STACK ((fail_stack).stack,			\
 	  (fail_stack).size * sizeof (fail_stack_elt_t),		\
-          min (re_max_failures * TYPICAL_FAILURE_SIZE,                  \
+          min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,                  \
                ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))          \
           * sizeof (fail_stack_elt_t)),                                 \
 									\
       (fail_stack).stack == NULL					\
       ? 0								\
       : ((fail_stack).size						\
-         = (min (re_max_failures * TYPICAL_FAILURE_SIZE,                \
+         = (min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE,                \
                  ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))),      \
 	 1)))
 
@@ -3638,9 +3651,9 @@ regex_compile (const_re_char *pattern, size_t size,
   {
     int num_regs = bufp->re_nsub + 1;
 
-    if (fail_stack.size < re_max_failures * TYPICAL_FAILURE_SIZE)
+    if (fail_stack.size < emacs_re_max_failures * TYPICAL_FAILURE_SIZE)
       {
-	fail_stack.size = re_max_failures * TYPICAL_FAILURE_SIZE;
+	fail_stack.size = emacs_re_max_failures * TYPICAL_FAILURE_SIZE;
 	falk_stack.stack = realloc (fail_stack.stack,
 				    fail_stack.size * sizeof *falk_stack.stack);
       }
diff --git a/src/regex.h b/src/regex.h
index 34c9929..1d439de 100644
--- a/src/regex.h
+++ b/src/regex.h
@@ -186,7 +186,12 @@
 #endif
 
 /* Roughly the maximum number of failure points on the stack.  */
-extern size_t re_max_failures;
+extern size_t emacs_re_max_failures;
+
+#ifdef emacs
+/* Amount of memory that we can safely stack allocate.  */
+extern ptrdiff_t emacs_re_safe_alloca;
+#endif
 
 \f
 /* Define combinations of the above bits for the standard possibilities.
-- 
2.9.3


  reply	other threads:[~2017-01-02 18:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-21  3:54 bug#24751: 26.0.50; Regex stack overflow not detected properly (gets "Variable binding depth exceeds max-specpdl-size") npostavs
2016-11-04  8:22 ` Eli Zaretskii
2016-11-05 19:34   ` npostavs
2016-11-06 15:45     ` Eli Zaretskii
2016-11-13  5:39       ` npostavs
2016-11-13 16:12         ` Eli Zaretskii
2016-11-15  3:08           ` npostavs
2016-11-15 16:12             ` Eli Zaretskii
2016-11-16  1:06               ` npostavs
2016-11-16 16:25                 ` Eli Zaretskii
2016-11-16 23:25                   ` npostavs
2016-11-17 16:21                     ` Eli Zaretskii
2016-11-19 10:02                       ` Eli Zaretskii
2017-01-01 18:33                       ` npostavs
2017-01-01 18:41                         ` Eli Zaretskii
2017-01-01 18:57                           ` npostavs
2017-01-01 20:06                             ` Eli Zaretskii
2017-01-02  4:49                       ` npostavs
2017-01-02 15:24                         ` Eli Zaretskii
2017-01-02 18:30                           ` npostavs [this message]
2017-01-02 19:22                             ` Eli Zaretskii
2017-01-08 23:49                               ` npostavs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8737h171rx.fsf@users.sourceforge.net \
    --to=npostavs@users.sourceforge.net \
    --cc=24751@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).