From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: npostavs@users.sourceforge.net Newsgroups: gmane.emacs.bugs Subject: bug#24358: 25.1.50; re-search-forward errors with "Variable binding depth exceeds max-specpdl-size" Date: Thu, 20 Oct 2016 21:22:25 -0400 Message-ID: <87wph2ts1a.fsf@users.sourceforge.net> References: <87twe6sx2g.fsf@users.sourceforge.net> <87eg51ng4r.fsf_-_@users.sourceforge.net> <87k2djwumn.fsf@users.sourceforge.net> <83h98nidvd.fsf@gnu.org> <87eg3rvtsf.fsf@users.sourceforge.net> <83k2dihpm9.fsf@gnu.org> <8760p2wzgj.fsf@users.sourceforge.net> <838ttyhhzu.fsf@gnu.org> <871szqwu51.fsf@users.sourceforge.net> <831szqhbc2.fsf@gnu.org> <87h98hujcx.fsf@users.sourceforge.net> <831szkahyz.fsf@gnu.org> <87eg3jvfj6.fsf@users.sourceforge.net> <8360ov8lbu.fsf@gnu.org> <877f95uj66.fsf@users.sourceforge.net> <83zim0vn1t.fsf@gnu.org> <874m48v7wj.fsf@users.sourceforge.net> <83insov1zr.fsf@gnu.org> <87zilztzd5.fsf@users.sourceforge.net> <83oa2ftnvp.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1477013582 17226 195.159.176.226 (21 Oct 2016 01:33:02 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 21 Oct 2016 01:33:02 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: sam.halliday@gmail.com, 24358@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Oct 21 03:32:57 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxOhZ-0002wz-0L for geb-bug-gnu-emacs@m.gmane.org; Fri, 21 Oct 2016 03:32:53 +0200 Original-Received: from localhost ([::1]:57864 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxOhZ-0006Fj-Q1 for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Oct 2016 21:32:53 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58201) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxOXA-0005vz-JC for bug-gnu-emacs@gnu.org; Thu, 20 Oct 2016 21:22:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxOX4-0001zw-KG for bug-gnu-emacs@gnu.org; Thu, 20 Oct 2016 21:22:08 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55437) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bxOX4-0001z0-GN for bug-gnu-emacs@gnu.org; Thu, 20 Oct 2016 21:22:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bxOX4-0000zR-8a for bug-gnu-emacs@gnu.org; Thu, 20 Oct 2016 21:22:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: npostavs@users.sourceforge.net Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 21 Oct 2016 01:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24358 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24358-submit@debbugs.gnu.org id=B24358.14770129183794 (code B ref 24358); Fri, 21 Oct 2016 01:22:02 +0000 Original-Received: (at 24358) by debbugs.gnu.org; 21 Oct 2016 01:21:58 +0000 Original-Received: from localhost ([127.0.0.1]:42603 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxOWz-0000z8-Rd for submit@debbugs.gnu.org; Thu, 20 Oct 2016 21:21:58 -0400 Original-Received: from mail-it0-f50.google.com ([209.85.214.50]:38376) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxOWy-0000yr-5J for 24358@debbugs.gnu.org; Thu, 20 Oct 2016 21:21:56 -0400 Original-Received: by mail-it0-f50.google.com with SMTP id 66so125750749itl.1 for <24358@debbugs.gnu.org>; Thu, 20 Oct 2016 18:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=nl81S8QPLBX+ZGdW/I0VC9ST0lP8VBGyi5FQTCQZTW4=; b=nU14FP7vRxZ7HkBFNNMmqkstEQXyJ9wt+rapHGgmhFTgGv/K/KTPkLSfrZ+NbzTm7N KFnrB8wLkXh/OAK6di8julWMzzSW0Cl/BgzugUb8msBv4bhOXs7aoDY9tZC0n6INrT9N pDp/Jj8FYI+YXJ3ra/VtcF6XxFUmGrrPP/garTq0P0wOSTFmt9aLxqJKWqjd6w0PeI3j RTq4fORvwhutp71chyzJ9AmGTW4i59K3lLofaAEWxuScJhBJG17g5Tbhiwtr94KUjGhH aYAn3JSUmkjSAIywgFHGrKcqPpCBH5fYZNQwy/7kMs0j+HE5dJ2SZnGrNBEcrrki9KhY nHTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=nl81S8QPLBX+ZGdW/I0VC9ST0lP8VBGyi5FQTCQZTW4=; b=acLMpAnwMaYcw2cnJ6AejXevexQgxPR4JcYK+lOuDfR85P1COxOJ+nuKvq1io8x+04 lOWlXxV/vwp1VWRQL/O+/drqOpwumbNZ1Z0dHeWXOmbDPNoleVif/1oj/Fnkar7QjkUe YPDGHFLUWJdcb444f3jN5/YGK5ItfnuMqWoKbSxcWsZbqdncTL73bqo2OzYOjqY2s70D px5a2qsiSXWcAtuho8xqOJChJZt8VxaL3iD5hiICb1YGM2tKxLyvVZHTonxTezmloTKi I4th78HvlXm1GBnb4DZHERnv2pYeJIBp0jYgkeAvT3Xpsw5gUYQPAjsiigiZxjQ33zSP 3Jdw== X-Gm-Message-State: ABUngveUBXmHS64ZKT27sjgKDHRC1ow+72OgO35qjMkVroTNbq86dfoAaKYKBlQbny9kyg== X-Received: by 10.107.151.141 with SMTP id z135mr4449138iod.28.1477012909278; Thu, 20 Oct 2016 18:21:49 -0700 (PDT) Original-Received: from zony ([45.2.7.130]) by smtp.googlemail.com with ESMTPSA id t141sm6523937ita.0.2016.10.20.18.21.47 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 20 Oct 2016 18:21:48 -0700 (PDT) In-Reply-To: <83oa2ftnvp.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 20 Oct 2016 11:39:54 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:124756 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: > >> +#ifdef emacs >> +#define STR_BASE_PTR(obj) \ >> + (NILP(obj)? current_buffer->text->beg : \ > ^ > Please leave a blank before the left parenthesis where indicated. > Also, another blank between the right parenthesis and the following > question mark. > >> + STRINGP (obj)? SDATA (obj) : \ > > Likewise here. Damn, I can't I believe I'm still making these formatting errors, there's got to be a way to catch these automatically. >> + It is used for looking up syntax properties, and also to recompute >> + pointers in case the object is relocated by GC. > > Not "by GC", but "as a side effect of calling malloc". Maybe it's a > good idea to also mention ralloc.c here. > >> + /* Get pointers and sizes of the two strings that make up the >> + visible portion of the buffer. Note that we can use pointers >> + here, unlike in search_buffer, because we only call re_match_2 >> + once. */ > > I'm not sure the reader will understand the significance of calling > re_match_2 only once. It would be good to clarify the comment. Okay, I've added some more explanations to these comments. > > Otherwise, I think this should go in. This is for emacs-25, right? Technically, the bug seems to be present in 24.5 and earlier, though I can only trigger it in 25.1. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=v3-0001-Fix-handling-of-allocation-in-regex-matching.patch Content-Description: patch v3 >From 97b69a66148c0b28c6d865619b6c1bcee78902a5 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Wed, 19 Oct 2016 20:23:50 -0400 Subject: [PATCH v3] Fix handling of allocation in regex matching `re_match_2_internal' uses pointers to the lisp objects that it searches. Since it may call malloc when growing the "fail stack", these pointers may be invalidated while searching, resulting in memory curruption (Bug #24358). To fix this, we check the pointer that the lisp object (as specified by re_match_object) points to before and after growing the stack, and update existing pointers accordingly. * src/regex.c (STR_BASE_PTR): New macro. (ENSURE_FAIL_STACK, re_search_2): Use it to convert pointers into offsets before possible malloc call, and back into pointers again afterwards. (POS_AS_IN_BUFFER): Add explanatory comment about punning trick. * src/search.c (search_buffer): Instead of storing search location as pointers, store them as pointers and recompute the corresponding address for each call to `re_search_2'. (string_match_1, fast_string_match_internal, fast_looking_at): * src/dired.c (directory_files_internal): Set `re_match_object' to Qnil after calling `re_search' or `re_match_2'. * src/regex.h (re_match_object): Mention new usage in commentary. --- src/dired.c | 4 +++- src/regex.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- src/regex.h | 4 +++- src/search.c | 36 ++++++++++++++++++---------- 4 files changed, 103 insertions(+), 17 deletions(-) diff --git a/src/dired.c b/src/dired.c index dba575c..006f74c 100644 --- a/src/dired.c +++ b/src/dired.c @@ -259,9 +259,11 @@ directory_files_internal (Lisp_Object directory, Lisp_Object full, QUIT; bool wanted = (NILP (match) - || re_search (bufp, SSDATA (name), len, 0, len, 0) >= 0); + || (re_match_object = name, + re_search (bufp, SSDATA (name), len, 0, len, 0) >= 0)); immediate_quit = 0; + re_match_object = Qnil; /* Stop protecting name from GC. */ if (wanted) { diff --git a/src/regex.c b/src/regex.c index 164eb46..1346ef4 100644 --- a/src/regex.c +++ b/src/regex.c @@ -152,6 +152,8 @@ /* Converts the pointer to the char to BEG-based offset from the start. */ # define PTR_TO_OFFSET(d) POS_AS_IN_BUFFER (POINTER_TO_OFFSET (d)) +/* Strings are 0-indexed, buffers are 1-indexed; we pun on the boolean + result to get the right base index. */ # define POS_AS_IN_BUFFER(p) ((p) + (NILP (re_match_object) || BUFFERP (re_match_object))) # define RE_MULTIBYTE_P(bufp) ((bufp)->multibyte) @@ -1436,11 +1438,62 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) #define NEXT_FAILURE_HANDLE(h) fail_stack.stack[(h) - 3].integer #define TOP_FAILURE_HANDLE() fail_stack.frame +#ifdef emacs +#define STR_BASE_PTR(obj) \ + (NILP (obj) ? current_buffer->text->beg : \ + STRINGP (obj) ? SDATA (obj) : \ + NULL) +#else +#define STR_BASE_PTR(obj) NULL +#endif #define ENSURE_FAIL_STACK(space) \ while (REMAINING_AVAIL_SLOTS <= space) { \ + re_char* orig_base = STR_BASE_PTR (re_match_object); \ + ptrdiff_t string1_off, end1_off, end_match_1_off; \ + ptrdiff_t string2_off, end2_off, end_match_2_off; \ + ptrdiff_t d_off, dend_off, dfail_off; \ + if (orig_base) \ + { \ + if (string1) \ + { \ + string1_off = string1 - orig_base; \ + end1_off = end1 - orig_base; \ + end_match_1_off = end_match_1 - orig_base; \ + } \ + if (string2) \ + { \ + string2_off = string2 - orig_base; \ + end2_off = end2 - orig_base; \ + end_match_2_off = end_match_2 - orig_base; \ + } \ + d_off = d - orig_base; \ + dend_off = dend - orig_base; \ + dfail_off = dfail - orig_base; \ + } \ if (!GROW_FAIL_STACK (fail_stack)) \ - return -2; \ + return -2; \ + /* GROW_FAIL_STACK may call malloc and relocate the string */ \ + /* pointers. */ \ + re_char* new_base = STR_BASE_PTR (re_match_object); \ + if (new_base && new_base != orig_base) \ + { \ + if (string1) \ + { \ + string1 = new_base + string1_off; \ + end1 = new_base + end1_off; \ + end_match_1 = new_base + end_match_1_off; \ + } \ + if (string2) \ + { \ + string2 = new_base + string2_off; \ + end2 = new_base + end2_off; \ + end_match_2 = new_base + end_match_2_off; \ + } \ + d = new_base + d_off; \ + dend = new_base + dend_off; \ + dfail = new_base + dfail_off; \ + } \ DEBUG_PRINT ("\n Doubled stack; size now: %zd\n", (fail_stack).size);\ DEBUG_PRINT (" slots available: %zd\n", REMAINING_AVAIL_SLOTS);\ } @@ -4443,6 +4496,16 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *str1, size_t size1, && !bufp->can_be_null) return -1; + /* re_match_2_internal may allocate, causing a relocation of the + lisp text object that we're searching. */ + ptrdiff_t offset1, offset2; + re_char *orig_base = STR_BASE_PTR (re_match_object); + if (orig_base) + { + if (string1) offset1 = string1 - orig_base; + if (string2) offset2 = string2 - orig_base; + } + val = re_match_2_internal (bufp, string1, size1, string2, size2, startpos, regs, stop); @@ -4452,6 +4515,13 @@ re_search_2 (struct re_pattern_buffer *bufp, const char *str1, size_t size1, if (val == -2) return -2; + re_char *new_base = STR_BASE_PTR (re_match_object); + if (new_base && new_base != orig_base) + { + if (string1) string1 = offset1 + new_base; + if (string2) string2 = offset2 + new_base; + } + advance: if (!range) break; @@ -4887,8 +4957,8 @@ WEAK_ALIAS (__re_match, re_match) #endif /* not emacs */ #ifdef emacs -/* In Emacs, this is the string or buffer in which we - are matching. It is used for looking up syntax properties. */ +/* In Emacs, this is the string or buffer in which we are matching. + See the declaration in regex.h for details. */ Lisp_Object re_match_object; #endif diff --git a/src/regex.h b/src/regex.h index 51f4424..61c771c 100644 --- a/src/regex.h +++ b/src/regex.h @@ -169,7 +169,9 @@ extern reg_syntax_t re_syntax_options; #ifdef emacs # include "lisp.h" /* In Emacs, this is the string or buffer in which we are matching. - It is used for looking up syntax properties. + It is used for looking up syntax properties, and also to recompute + pointers in case the object is relocated as a side effect of + calling malloc (if it calls r_alloc_sbrk in ralloc.c). If the value is a Lisp string object, we are matching text in that string; if it's nil, we are matching text in the current buffer; if diff --git a/src/search.c b/src/search.c index dc7e2d8..ec5a1d7 100644 --- a/src/search.c +++ b/src/search.c @@ -287,8 +287,10 @@ looking_at_1 (Lisp_Object string, bool posix) immediate_quit = 1; QUIT; /* Do a pending quit right away, to avoid paradoxical behavior */ - /* Get pointers and sizes of the two strings - that make up the visible portion of the buffer. */ + /* Get pointers and sizes of the two strings that make up the + visible portion of the buffer. Note that we can use pointers + here, unlike in search_buffer, because we only call re_match_2 + once, after which we never use the pointers again. */ p1 = BEGV_ADDR; s1 = GPT_BYTE - BEGV_BYTE; @@ -407,6 +409,7 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start, (NILP (Vinhibit_changing_match_data) ? &search_regs : NULL)); immediate_quit = 0; + re_match_object = Qnil; /* Stop protecting string from GC. */ /* Set last_thing_searched only when match data is changed. */ if (NILP (Vinhibit_changing_match_data)) @@ -477,6 +480,7 @@ fast_string_match_internal (Lisp_Object regexp, Lisp_Object string, SBYTES (string), 0, SBYTES (string), 0); immediate_quit = 0; + re_match_object = Qnil; /* Stop protecting string from GC. */ return val; } @@ -564,6 +568,7 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte, len = re_match_2 (buf, (char *) p1, s1, (char *) p2, s2, pos_byte, NULL, limit_byte); immediate_quit = 0; + re_match_object = Qnil; /* Stop protecting string from GC. */ return len; } @@ -1178,8 +1183,8 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte, if (RE && !(trivial_regexp_p (string) && NILP (Vsearch_spaces_regexp))) { - unsigned char *p1, *p2; - ptrdiff_t s1, s2; + unsigned char *base; + ptrdiff_t off1, off2, s1, s2; struct re_pattern_buffer *bufp; bufp = compile_pattern (string, @@ -1193,16 +1198,19 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte, can take too long. */ QUIT; /* Do a pending quit right away, to avoid paradoxical behavior */ - /* Get pointers and sizes of the two strings - that make up the visible portion of the buffer. */ + /* Get offsets and sizes of the two strings that make up the + visible portion of the buffer. We compute offsets instead of + pointers because re_search_2 may call malloc and therefore + change the buffer text address. */ - p1 = BEGV_ADDR; + base = current_buffer->text->beg; + off1 = BEGV_ADDR - base; s1 = GPT_BYTE - BEGV_BYTE; - p2 = GAP_END_ADDR; + off2 = GAP_END_ADDR - base; s2 = ZV_BYTE - GPT_BYTE; if (s1 < 0) { - p2 = p1; + off2 = off1; s2 = ZV_BYTE - BEGV_BYTE; s1 = 0; } @@ -1217,7 +1225,9 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte, { ptrdiff_t val; - val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2, + val = re_search_2 (bufp, + (char*) (base + off1), s1, + (char*) (base + off2), s2, pos_byte - BEGV_BYTE, lim_byte - pos_byte, (NILP (Vinhibit_changing_match_data) ? &search_regs : &search_regs_1), @@ -1262,8 +1272,10 @@ search_buffer (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte, { ptrdiff_t val; - val = re_search_2 (bufp, (char *) p1, s1, (char *) p2, s2, - pos_byte - BEGV_BYTE, lim_byte - pos_byte, + val = re_search_2 (bufp, + (char*) (base + off1), s1, + (char*) (base + off2), s2, + pos_byte - BEGV_BYTE, lim_byte - pos_byte, (NILP (Vinhibit_changing_match_data) ? &search_regs : &search_regs_1), lim_byte - BEGV_BYTE); -- 2.9.3 --=-=-=--