unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: npostavs@users.sourceforge.net
To: Eli Zaretskii <eliz@gnu.org>
Cc: sam.halliday@gmail.com, 24358@debbugs.gnu.org
Subject: bug#24358: 25.1.50; re-search-forward errors with "Variable binding depth exceeds max-specpdl-size"
Date: Thu, 20 Oct 2016 00:31:50 -0400	[thread overview]
Message-ID: <87zilztzd5.fsf@users.sourceforge.net> (raw)
In-Reply-To: <83insov1zr.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 19 Oct 2016 17:37:28 +0300")

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:
>
>> +#ifdef emacs
>> +#define STR_BASE_PTR(obj)                       \
>> +    (BUFFERP (obj)? XBUFFER (obj)->text->beg :  \
>> +     STRINGP (obj)? SDATA (obj) :               \
>> +     NULL)
>

[...]

> the only test in the macro should be STRINGP.
>

Hmm, not sure I feel comfortable being that implicit.  I kept this macro
the same except for using (NILP (obj)? current_buffer->...) instead of
BUFFERP and XBUFFER.

>
> Btw, note that regex.c already has macros PTR_TO_OFFSET and
> POS_AS_IN_BUFFER which you can use.

AFAICT these are not useful for this: they give offets relative to
string1 (or string2), which would not help to compute the new value for
string1 (or string2, etc...).  Since I was looking at it, I've also
added a comment about the trick of punning the boolean result into
buffer or string base index.

------

Here is the new patch:


[-- Attachment #2: patch v2 --]
[-- Type: text/plain, Size: 12538 bytes --]

From 92700753ac38b947dd2a725478b07dd7ef229c3a Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Wed, 19 Oct 2016 20:23:50 -0400
Subject: [PATCH v2] Fix handling of allocation in regex matching

`re_match_2_internal' uses pointers to the lisp objects that it
searches.  Since it may call malloc when growing the "fail stack", these
pointers may be invalidated while searching, resulting in memory
curruption (Bug #24358).

To fix this, we check the pointer that the lisp object (as specified by
re_match_object) points to before and after growing the stack, and
update existing pointers accordingly.

* src/regex.c (STR_BASE_PTR): New macro.
(ENSURE_FAIL_STACK, re_search_2): Use it to convert pointers into
offsets before possible malloc call, and back into pointers again
afterwards.
(POS_AS_IN_BUFFER): Add explanatory comment about punning trick.
* src/search.c (search_buffer): Instead of storing search location as
pointers, store them as pointers and recompute the corresponding address
for each call to `re_search_2'.
(string_match_1, fast_string_match_internal, fast_looking_at):
* src/dired.c (directory_files_internal): Set `re_match_object' to Qnil
after calling `re_search' or `re_match_2'.
* src/regex.h (re_match_object): Mention new usage in commentary.
---
 src/dired.c  |  4 +++-
 src/regex.c  | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/regex.h  |  3 ++-
 src/search.c | 36 ++++++++++++++++++----------
 4 files changed, 102 insertions(+), 17 deletions(-)

diff --git a/src/dired.c b/src/dired.c
index dba575c..006f74c 100644
--- a/src/dired.c
+++ b/src/dired.c
@@ -259,9 +259,11 @@ directory_files_internal (Lisp_Object directory, Lisp_Object full,
       QUIT;
 
       bool wanted = (NILP (match)
-		     || re_search (bufp, SSDATA (name), len, 0, len, 0) >= 0);
+		     || (re_match_object = name,
+                         re_search (bufp, SSDATA (name), len, 0, len, 0) >= 0));
 
       immediate_quit = 0;
+      re_match_object = Qnil;   /* Stop protecting name from GC.  */
 
       if (wanted)
 	{
diff --git a/src/regex.c b/src/regex.c
index 164eb46..b710f50 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -152,6 +152,8 @@
 
 /* Converts the pointer to the char to BEG-based offset from the start.  */
 # define PTR_TO_OFFSET(d) POS_AS_IN_BUFFER (POINTER_TO_OFFSET (d))
+/* Strings are 0-indexed, buffers are 1-indexed; we pun on the boolean
+   result to get the right base index.  */
 # define POS_AS_IN_BUFFER(p) ((p) + (NILP (re_match_object) || BUFFERP (re_match_object)))
 
 # define RE_MULTIBYTE_P(bufp) ((bufp)->multibyte)
@@ -1436,11 +1438,62 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax)
 #define NEXT_FAILURE_HANDLE(h) fail_stack.stack[(h) - 3].integer
 #define TOP_FAILURE_HANDLE() fail_stack.frame
 
+#ifdef emacs
+#define STR_BASE_PTR(obj)                   \
+  (NILP(obj)? current_buffer->text->beg :   \
+   STRINGP (obj)? SDATA (obj) :             \
+   NULL)
+#else
+#define STR_BASE_PTR(obj) NULL
+#endif
 
 #define ENSURE_FAIL_STACK(space)					\
 while (REMAINING_AVAIL_SLOTS <= space) {				\
+  re_char* orig_base = STR_BASE_PTR (re_match_object);                  \
+  ptrdiff_t string1_off, end1_off, end_match_1_off;                     \
+  ptrdiff_t string2_off, end2_off, end_match_2_off;                     \
+  ptrdiff_t d_off, dend_off, dfail_off;                                 \
+  if (orig_base)                                                        \
+    {                                                                   \
+      if (string1)                                                      \
+        {                                                               \
+          string1_off = string1 - orig_base;                            \
+          end1_off = end1 - orig_base;                                  \
+          end_match_1_off = end_match_1 - orig_base;                    \
+        }                                                               \
+      if (string2)                                                      \
+        {                                                               \
+          string2_off = string2 - orig_base;                            \
+          end2_off = end2 - orig_base;                                  \
+          end_match_2_off = end_match_2 - orig_base;                    \
+        }                                                               \
+      d_off = d - orig_base;                                            \
+      dend_off = dend - orig_base;                                  \
+      dfail_off = dfail - orig_base;                                    \
+    }                                                                   \
   if (!GROW_FAIL_STACK (fail_stack))					\
-    return -2;								\
+    return -2;                                                          \
+  /* GROW_FAIL_STACK may call malloc and relocate the string */         \
+  /* pointers.  */                                                      \
+  re_char* new_base = STR_BASE_PTR (re_match_object);                   \
+  if (new_base && new_base != orig_base)                                \
+    {                                                                   \
+      if (string1)                                                      \
+        {                                                               \
+          string1 = new_base + string1_off;                             \
+          end1 = new_base + end1_off;                                   \
+          end_match_1 = new_base + end_match_1_off;                     \
+        }                                                               \
+      if (string2)                                                      \
+        {                                                               \
+          string2 = new_base + string2_off;                             \
+          end2 = new_base + end2_off;                                   \
+          end_match_2 = new_base + end_match_2_off;                     \
+        }                                                               \
+      d = new_base + d_off;                                             \
+      dend = new_base + dend_off;                                       \
+      dfail = new_base + dfail_off;                                     \
+    }                                                                   \
   DEBUG_PRINT ("\n  Doubled stack; size now: %zd\n", (fail_stack).size);\
   DEBUG_PRINT ("	 slots available: %zd\n", REMAINING_AVAIL_SLOTS);\
 }
@@ -4443,6 +4496,16 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *str1, size_t size1,
 	  && !bufp->can_be_null)
 	return -1;
 
+      /* re_match_2_internal may allocate, causing a relocation of the
+         lisp text object that we're searching.  */
+      ptrdiff_t offset1, offset2;
+      re_char *orig_base = STR_BASE_PTR (re_match_object);
+      if (orig_base)
+        {
+          if (string1) offset1 = string1 - orig_base;
+          if (string2) offset2 = string2 - orig_base;
+        }
+
       val = re_match_2_internal (bufp, string1, size1, string2, size2,
 				 startpos, regs, stop);
 
@@ -4452,6 +4515,13 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *str1, size_t size1,
       if (val == -2)
 	return -2;
 
+      re_char *new_base = STR_BASE_PTR (re_match_object);
+      if (new_base && new_base != orig_base)
+        {
+          if (string1) string1 = offset1 + new_base;
+          if (string2) string2 = offset2 + new_base;
+        }
+
     advance:
       if (!range)
 	break;
@@ -4887,8 +4957,8 @@ WEAK_ALIAS (__re_match, re_match)
 #endif /* not emacs */
 
 #ifdef emacs
-/* In Emacs, this is the string or buffer in which we
-   are matching.  It is used for looking up syntax properties.  */
+/* In Emacs, this is the string or buffer in which we are matching.
+   See the declaration in regex.h for details.  */
 Lisp_Object re_match_object;
 #endif
 
diff --git a/src/regex.h b/src/regex.h
index 51f4424..d5c9690 100644
--- a/src/regex.h
+++ b/src/regex.h
@@ -169,7 +169,8 @@ extern reg_syntax_t re_syntax_options;
 #ifdef emacs
 # include "lisp.h"
 /* In Emacs, this is the string or buffer in which we are matching.
-   It is used for looking up syntax properties.
+   It is used for looking up syntax properties, and also to recompute
+   pointers in case the object is relocated by GC.
 
    If the value is a Lisp string object, we are matching text in that
    string; if it's nil, we are matching text in the current buffer; if
diff --git a/src/search.c b/src/search.c
index dc7e2d8..9a2805d 100644
--- a/src/search.c
+++ b/src/search.c
@@ -287,8 +287,10 @@ looking_at_1 (Lisp_Object string, bool posix)
   immediate_quit = 1;
   QUIT;			/* Do a pending quit right away, to avoid paradoxical behavior */
 
-  /* Get pointers and sizes of the two strings
-     that make up the visible portion of the buffer. */
+  /* Get pointers and sizes of the two strings that make up the
+     visible portion of the buffer.  Note that we can use pointers
+     here, unlike in search_buffer, because we only call re_match_2
+     once.  */
 
   p1 = BEGV_ADDR;
   s1 = GPT_BYTE - BEGV_BYTE;
@@ -407,6 +409,7 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
 		   (NILP (Vinhibit_changing_match_data)
 		    ? &search_regs : NULL));
   immediate_quit = 0;
+  re_match_object = Qnil;       /* Stop protecting string from GC.  */
 
   /* Set last_thing_searched only when match data is changed.  */
   if (NILP (Vinhibit_changing_match_data))
@@ -477,6 +480,7 @@ fast_string_match_internal (Lisp_Object regexp, Lisp_Object string,
 		   SBYTES (string), 0,
 		   SBYTES (string), 0);
   immediate_quit = 0;
+  re_match_object = Qnil;       /* Stop protecting string from GC.  */
   return val;
 }
 
@@ -564,6 +568,7 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
   len = re_match_2 (buf, (char *) p1, s1, (char *) p2, s2,
 		    pos_byte, NULL, limit_byte);
   immediate_quit = 0;
+  re_match_object = Qnil;       /* Stop protecting string from GC.  */
 
   return len;
 }
@@ -1178,8 +1183,8 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
 
   if (RE && !(trivial_regexp_p (string) && NILP (Vsearch_spaces_regexp)))
     {
-      unsigned char *p1, *p2;
-      ptrdiff_t s1, s2;
+      unsigned char *base;
+      ptrdiff_t off1, off2, s1, s2;
       struct re_pattern_buffer *bufp;
 
       bufp = compile_pattern (string,
@@ -1193,16 +1198,19 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
 				   can take too long. */
       QUIT;			/* Do a pending quit right away,
 				   to avoid paradoxical behavior */
-      /* Get pointers and sizes of the two strings
-	 that make up the visible portion of the buffer. */
+      /* Get offsets and sizes of the two strings that make up the
+         visible portion of the buffer.  We compute offsets instead of
+         pointers because re_search_2 may call malloc and therefore
+         change the buffer text address.  */
 
-      p1 = BEGV_ADDR;
+      base = current_buffer->text->beg;
+      off1 = BEGV_ADDR - base;
       s1 = GPT_BYTE - BEGV_BYTE;
-      p2 = GAP_END_ADDR;
+      off2 = GAP_END_ADDR - base;
       s2 = ZV_BYTE - GPT_BYTE;
       if (s1 < 0)
 	{
-	  p2 = p1;
+          off2 = off1;
 	  s2 = ZV_BYTE - BEGV_BYTE;
 	  s1 = 0;
 	}
@@ -1217,7 +1225,9 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
 	{
 	  ptrdiff_t val;
 
-	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
+          val = re_search_2 (bufp,
+                             (char*) (base + off1), s1,
+                             (char*) (base + off2), s2,
 			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
 			     (NILP (Vinhibit_changing_match_data)
 			      ? &search_regs : &search_regs_1),
@@ -1262,8 +1272,10 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
 	{
 	  ptrdiff_t val;
 
-	  val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2,
-			     pos_byte - BEGV_BYTE, lim_byte - pos_byte,
+          val = re_search_2 (bufp,
+                             (char*) (base + off1), s1,
+                             (char*) (base + off2), s2,
+                             pos_byte - BEGV_BYTE, lim_byte - pos_byte,
 			     (NILP (Vinhibit_changing_match_data)
 			      ? &search_regs : &search_regs_1),
 			     lim_byte - BEGV_BYTE);
-- 
2.9.3


  reply	other threads:[~2016-10-20  4:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-26 20:17 bug#24315: 25.1.50; re-search-forward errors with "Variable binding depth exceeds max-specpdl-size" Peder O. Klingenberg
2016-08-27  3:35 ` npostavs
2016-08-30 13:09   ` Peder O. Klingenberg
2016-09-02  1:58     ` npostavs
2016-09-02 13:45       ` Peder O. Klingenberg
2016-09-03 14:21         ` npostavs
2016-09-06  8:18           ` Peder O. Klingenberg
2016-09-07 23:27             ` npostavs
2016-09-03 15:43   ` bug#24358: " npostavs
2016-10-08  0:29     ` npostavs
2016-10-08  5:55       ` Eli Zaretskii
2016-10-08 13:45         ` npostavs
2016-10-08 14:39           ` Eli Zaretskii
2016-10-08 14:47             ` Eli Zaretskii
2016-10-08 16:57             ` npostavs
2016-10-08 17:23               ` Eli Zaretskii
2016-10-08 18:52                 ` npostavs
2016-10-08 19:47                   ` Eli Zaretskii
2016-10-08 20:55                     ` npostavs
2016-10-09  6:52                       ` Eli Zaretskii
2016-10-13  1:29                     ` npostavs
2016-10-13  6:19                       ` Eli Zaretskii
2016-10-14  2:19                         ` npostavs
2016-10-14  7:02                           ` Eli Zaretskii
2016-10-19  3:11                             ` npostavs
2016-10-19  7:02                               ` Eli Zaretskii
2016-10-19 12:29                                 ` npostavs
2016-10-19 14:37                                   ` Eli Zaretskii
2016-10-20  4:31                                     ` npostavs [this message]
2016-10-20  8:39                                       ` Eli Zaretskii
2016-10-21  1:22                                         ` npostavs
2016-10-21  7:17                                           ` Eli Zaretskii
2016-10-22  2:36                                             ` npostavs
2016-10-22 21:54                                               ` Sam Halliday
2016-10-22 22:46                                                 ` npostavs
2016-10-23  6:41                                                   ` Eli Zaretskii
2016-10-23  8:57                                                     ` Sam Halliday
2016-10-23  9:19                                                       ` Eli Zaretskii
2016-10-23 13:40                                                         ` Sam Halliday
2016-10-23 14:07                                                           ` Eli Zaretskii
2016-10-23 15:42                                                             ` Sam Halliday
2016-10-23 15:48                                                               ` Eli Zaretskii
2016-10-23 15:58                                                                 ` Sam Halliday
2016-10-23 15:58                                                                   ` Sam Halliday
2016-10-23 16:44                                                                     ` Eli Zaretskii
2016-10-23 17:19                                                                   ` Eli Zaretskii
2016-10-23 18:06                                                                     ` Eli Zaretskii
2016-10-23 18:14                                                                       ` Noam Postavsky
2016-10-23 19:18                                                                         ` Eli Zaretskii
2016-10-24 13:29                                                                           ` npostavs
2016-10-24 13:39                                                                             ` Eli Zaretskii
2016-10-24 15:33                                                                               ` Noam Postavsky
2016-10-24 16:13                                                                                 ` Eli Zaretskii
2016-10-25  2:00                                                                                   ` npostavs
2016-10-25 16:03                                                                                     ` Eli Zaretskii
2016-10-26  0:16                                                                                       ` npostavs
2016-10-24 13:43                                                                             ` Eli Zaretskii
2016-10-24 14:03                                                                               ` Eli Zaretskii
2016-10-24 20:13                                                                             ` Sam Halliday
2016-10-24 23:44                                                                               ` npostavs
2016-11-07  3:39                                                                               ` Eli Zaretskii
2016-11-07  3:56                                                                                 ` Noam Postavsky
2016-11-07 15:10                                                                                   ` Eli Zaretskii
2016-10-23 18:16                                                                       ` Sam Halliday
2016-10-23 19:10                                                                         ` Eli Zaretskii
2016-10-23 19:32                                                                           ` Eli Zaretskii
2016-10-23 20:15                                                                             ` Sam Halliday
2016-10-23 20:27                                                                               ` Eli Zaretskii
2016-10-23 20:18                                                                             ` Eli Zaretskii
2016-10-23 23:18                                                                               ` Noam Postavsky
2016-10-24  7:05                                                                                 ` Eli Zaretskii
2016-10-24  8:40                                                                                   ` Eli Zaretskii
2016-10-23 18:11                                                                     ` Sam Halliday
2016-10-18  8:16 ` bug#24358: 25.1.50; Sam Halliday
2016-10-18  8:56   ` Sam Halliday
2016-10-18  9:28   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zilztzd5.fsf@users.sourceforge.net \
    --to=npostavs@users.sourceforge.net \
    --cc=24358@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=sam.halliday@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).