From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: npostavs@users.sourceforge.net Newsgroups: gmane.emacs.bugs Subject: bug#24751: 26.0.50; Regex stack overflow not detected properly (gets "Variable binding depth exceeds max-specpdl-size") Date: Sun, 01 Jan 2017 23:49:46 -0500 Message-ID: <877f6e6p79.fsf@users.sourceforge.net> References: <87twc6tl0i.fsf@users.sourceforge.net> <83h97nlknj.fsf@gnu.org> <87mvhdoh4q.fsf@users.sourceforge.net> <83zilcipcr.fsf@gnu.org> <87a8d4lyzo.fsf@users.sourceforge.net> <83a8d3cq9s.fsf@gnu.org> <87wpg5l9st.fsf@users.sourceforge.net> <83d1hwhgdi.fsf@gnu.org> <87r36ckzca.fsf@users.sourceforge.net> <83polvfl3h.fsf@gnu.org> <87oa1fknx9.fsf@users.sourceforge.net> <83y40idqm3.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1483332562 13544 195.159.176.226 (2 Jan 2017 04:49:22 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 2 Jan 2017 04:49:22 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) Cc: 24751@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jan 02 05:49:17 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNuYb-0001k5-Ge for geb-bug-gnu-emacs@m.gmane.org; Mon, 02 Jan 2017 05:49:13 +0100 Original-Received: from localhost ([::1]:55455 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNuYa-0006FG-K5 for geb-bug-gnu-emacs@m.gmane.org; Sun, 01 Jan 2017 23:49:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45146) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNuYT-0006Ey-Lg for bug-gnu-emacs@gnu.org; Sun, 01 Jan 2017 23:49:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cNuYQ-0000Bf-Hx for bug-gnu-emacs@gnu.org; Sun, 01 Jan 2017 23:49:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:51801) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cNuYQ-0000BR-Ek for bug-gnu-emacs@gnu.org; Sun, 01 Jan 2017 23:49:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cNuYQ-0006Vm-2t for bug-gnu-emacs@gnu.org; Sun, 01 Jan 2017 23:49:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: npostavs@users.sourceforge.net Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 02 Jan 2017 04:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24751 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24751-submit@debbugs.gnu.org id=B24751.148333253125013 (code B ref 24751); Mon, 02 Jan 2017 04:49:02 +0000 Original-Received: (at 24751) by debbugs.gnu.org; 2 Jan 2017 04:48:51 +0000 Original-Received: from localhost ([127.0.0.1]:38967 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNuYF-0006VN-F6 for submit@debbugs.gnu.org; Sun, 01 Jan 2017 23:48:51 -0500 Original-Received: from mail-io0-f172.google.com ([209.85.223.172]:36264) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cNuYD-0006V7-39 for 24751@debbugs.gnu.org; Sun, 01 Jan 2017 23:48:49 -0500 Original-Received: by mail-io0-f172.google.com with SMTP id h133so169775335ioe.3 for <24751@debbugs.gnu.org>; Sun, 01 Jan 2017 20:48:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=SHhogUpWDJe+Kixr62Bwf4oONgmG0vWO9jhA0m6ZKYQ=; b=VpPS9Ebe9rBiMTs9+Z/0PoT1t7QykbDDarUVJ3vBq7Hjv0N6PPS0HuccfOq1cFrqz7 4WOWgCACnMWInphFLiwWY09PX57we6Qx0k0NiOjK4ohsORuFx/sr4uKKDX09enIEp5FL xtW02vVUyD5mRA4JoeOdLjwlrZYYjwFJpqZzDstJY0GQnfvHfYktUMxd2M2aDUN1Gcd7 PUMiMjhCiUusgejitr4+UBG6QV26C8ASNjDDGC0LuM78mwI2jW+CFZ6QvgW+ziiWVM0N hXvgDmLI2oK6MRDDUJ509Yzs9q9GjbutH2nn/hs0/G6LMvhzct97/lWYFEpJmQukgCtm YTZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=SHhogUpWDJe+Kixr62Bwf4oONgmG0vWO9jhA0m6ZKYQ=; b=J9+4FnP9jUIi+d48OsxMePypEHtG9IwmK/DA13E7uqLseHplsnaSry94P6Ztgw4CNX v4SvMWSHGLhFncy5nYPj0GzhFPkhgWBliSwihehAzFP4LjkM0aHUdk+cqz7Msg2NGLjX pfV+82WjbgwbGOLHxmX0Us6O55LSTQz6PadTA/zSfYrkDmYBvV8gKJNm/E+4KfCD7+qO jidYG40jvhJ29qjD01qs53CaZ7cuDky5ohYGaWqWEN9hUrBh/UkghmMM2UnXvVYxD3w7 tFbio6GVEhXv2MBJIyxE+BxCxhb2MB+53xxxxN2I9bdIbeoWaQmvbFHVRn+TtiD6IiMZ QXyQ== X-Gm-Message-State: AIkVDXIUxISfGNtk6H/xy1qagENth82sUBimFYmcsd5nuufxEvx92kLq2qr8kITb13POMg== X-Received: by 10.107.24.196 with SMTP id 187mr43124503ioy.81.1483332523513; Sun, 01 Jan 2017 20:48:43 -0800 (PST) Original-Received: from zony ([45.2.7.65]) by smtp.googlemail.com with ESMTPSA id n3sm30037052itb.2.2017.01.01.20.48.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 01 Jan 2017 20:48:42 -0800 (PST) In-Reply-To: <83y40idqm3.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 17 Nov 2016 18:21:24 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:127667 Archived-At: --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> >> /* Define MATCH_MAY_ALLOCATE unless we need to make sure that the >> searching and matching functions should not call alloca. On some >> systems, alloca is implemented in terms of malloc, and if we're >> using the relocating allocator routines, then malloc could cause a >> relocation, which might (if the strings being searched are in the >> ralloc heap) shift the data out from underneath the regexp >> routines. > > > The first part is not obsolete, but its reasoning is backwards: > SAFE_ALLOCA indeed can call malloc, but it could only cause relocation > if REGEX_MALLOC is defined (and ralloc.c is compiled in). And when > you define REGEX_MALLOC, MATCH_MAY_ALLOCATE is undefined. So the text > there should be revised. Everything you've said makes sense after your last message, but I'm still unable to put it together and come up with a revised comment. Could you make a suggestion? Here are the rest of the changes. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=v1-0001-Use-expanded-stack-during-regex-matches.patch Content-Description: patch >From 60e43d62735b212738584da0631c8efc768084c5 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Sun, 1 Jan 2017 14:09:13 -0500 Subject: [PATCH v1] Use expanded stack during regex matches While the stack is increased in main(), to allow the regex stack allocation to use alloca we also need to modify regex.c to actually take advantage of the increased stack, and not limit stack allocations to SAFE_ALLOCA bytes. * src/regex.c (MATCH_MAY_ALLOCATE): Remove obsolete comment about allocations in signal handlers which no longer happens. (emacs_re_safe_alloca): New variable. (REGEX_USE_SAFE_ALLOCA): Use it as the limit of stack allocation instead of MAX_ALLOCA. (emacs_re_max_failures): Rename from `re_max_failures' to avoid confusion the glibc's `re_max_failures'. * src/emacs.c (main): Increase the amount of fixed 'extra' bytes we add to the stack. Instead of changing emacs_re_max_failures based on the new stack size, just change emacs_re_safe_alloca; emacs_re_max_failures remains constant regardless, since if we run out stack space SAFE_ALLOCA will fall back to heap allocation. --- src/emacs.c | 21 ++++++++++++--------- src/regex.c | 44 ++++++++++++++++++++++---------------------- src/regex.h | 7 ++++++- 3 files changed, 40 insertions(+), 32 deletions(-) diff --git a/src/emacs.c b/src/emacs.c index ae29e9a..cb6b503 100644 --- a/src/emacs.c +++ b/src/emacs.c @@ -831,14 +831,16 @@ main (int argc, char **argv) rlim_t lim = rlim.rlim_cur; /* Approximate the amount regex.c needs per unit of - re_max_failures, then add 33% to cover the size of the + emacs_re_max_failures, then add 33% to cover the size of the smaller stacks that regex.c successively allocates and discards on its way to the maximum. */ - int ratio = 20 * sizeof (char *); - ratio += ratio / 3; + int min_ratio = 20 * sizeof (char *); + int ratio = min_ratio + min_ratio / 3; - /* Extra space to cover what we're likely to use for other reasons. */ - int extra = 200000; + /* Extra space to cover what we're likely to use for other + reasons. For example, a typical GC might take 30K stack + frames. */ + int extra = (30 * 1000) * 50; bool try_to_grow_stack = true; #ifndef CANNOT_DUMP @@ -847,7 +849,7 @@ main (int argc, char **argv) if (try_to_grow_stack) { - rlim_t newlim = re_max_failures * ratio + extra; + rlim_t newlim = emacs_re_max_failures * ratio + extra; /* Round the new limit to a page boundary; this is needed for Darwin kernel 15.4.0 (see Bug#23622) and perhaps @@ -869,9 +871,10 @@ main (int argc, char **argv) lim = newlim; } } - - /* Don't let regex.c overflow the stack. */ - re_max_failures = lim < extra ? 0 : min (lim - extra, SIZE_MAX) / ratio; + /* Let regex.c use more stack, if available. */ + emacs_re_safe_alloca = max + (min (lim - extra, SIZE_MAX) * (min_ratio / ratio), + MAX_ALLOCA); } #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */ diff --git a/src/regex.c b/src/regex.c index a2d2c52..adf2fa1 100644 --- a/src/regex.c +++ b/src/regex.c @@ -430,9 +430,12 @@ init_syntax_once (void) /* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we use `alloca' instead of `malloc'. This is because using malloc in - re_search* or re_match* could cause memory leaks when C-g is used in - Emacs; also, malloc is slower and causes storage fragmentation. On - the other hand, malloc is more portable, and easier to debug. + re_search* or re_match* could cause memory leaks when C-g is used + in Emacs (note that SAFE_ALLOCA could also call malloc, but does so + via `record_xmalloc' which uses `unwind_protect' to ensure the + memory is freed even in case of non-local exits); also, malloc is + slower and causes storage fragmentation. On the other hand, malloc + is more portable, and easier to debug. Because we sometimes use alloca, some routines have to be macros, not functions -- `alloca'-allocated space disappears at the end of the @@ -447,7 +450,13 @@ init_syntax_once (void) #else /* not REGEX_MALLOC */ # ifdef emacs -# define REGEX_USE_SAFE_ALLOCA USE_SAFE_ALLOCA +/* This may be adjusted in main(), if the stack is successfully grown. */ +ptrdiff_t emacs_re_safe_alloca = MAX_ALLOCA; +/* Like USE_SAFE_ALLOCA, but use emacs_re_safe_alloca. */ +# define REGEX_USE_SAFE_ALLOCA \ + ptrdiff_t sa_avail = emacs_re_safe_alloca; \ + ptrdiff_t sa_count = SPECPDL_INDEX (); bool sa_must_free = false + # define REGEX_SAFE_FREE() SAFE_FREE () # define REGEX_ALLOCATE SAFE_ALLOCA # else @@ -1203,16 +1212,7 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) using the relocating allocator routines, then malloc could cause a relocation, which might (if the strings being searched are in the ralloc heap) shift the data out from underneath the regexp - routines. - - Here's another reason to avoid allocation: Emacs - processes input from X in a signal handler; processing X input may - call malloc; if input arrives while a matching routine is calling - malloc, then we're scrod. But Emacs can't just block input while - calling matching routines; then we don't notice interrupts when - they come in. So, Emacs blocks input around all regexp calls - except the matching calls, which it leaves unprotected, in the - faith that they will not malloc. */ + routines. */ /* Normally, this is fine. */ #define MATCH_MAY_ALLOCATE @@ -1249,9 +1249,9 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) whose default stack limit is 2mb. In order for a larger value to work reliably, you have to try to make it accord with the process stack limit. */ -size_t re_max_failures = 40000; +size_t emacs_re_max_failures = 40000; # else -size_t re_max_failures = 4000; +size_t emacs_re_max_failures = 4000; # endif union fail_stack_elt @@ -1304,7 +1304,7 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) /* Double the size of FAIL_STACK, up to a limit - which allows approximately `re_max_failures' items. + which allows approximately `emacs_re_max_failures' items. Return 1 if succeeds, and 0 if either ran out of memory allocating space for it or it was already too large. @@ -1319,19 +1319,19 @@ WEAK_ALIAS (__re_set_syntax, re_set_syntax) #define FAIL_STACK_GROWTH_FACTOR 4 #define GROW_FAIL_STACK(fail_stack) \ - (((fail_stack).size >= re_max_failures * TYPICAL_FAILURE_SIZE) \ + (((fail_stack).size >= emacs_re_max_failures * TYPICAL_FAILURE_SIZE) \ ? 0 \ : ((fail_stack).stack \ = REGEX_REALLOCATE_STACK ((fail_stack).stack, \ (fail_stack).size * sizeof (fail_stack_elt_t), \ - min (re_max_failures * TYPICAL_FAILURE_SIZE, \ + min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE, \ ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR)) \ * sizeof (fail_stack_elt_t)), \ \ (fail_stack).stack == NULL \ ? 0 \ : ((fail_stack).size \ - = (min (re_max_failures * TYPICAL_FAILURE_SIZE, \ + = (min (emacs_re_max_failures * TYPICAL_FAILURE_SIZE, \ ((fail_stack).size * FAIL_STACK_GROWTH_FACTOR))), \ 1))) @@ -3638,9 +3638,9 @@ regex_compile (const_re_char *pattern, size_t size, { int num_regs = bufp->re_nsub + 1; - if (fail_stack.size < re_max_failures * TYPICAL_FAILURE_SIZE) + if (fail_stack.size < emacs_re_max_failures * TYPICAL_FAILURE_SIZE) { - fail_stack.size = re_max_failures * TYPICAL_FAILURE_SIZE; + fail_stack.size = emacs_re_max_failures * TYPICAL_FAILURE_SIZE; falk_stack.stack = realloc (fail_stack.stack, fail_stack.size * sizeof *falk_stack.stack); } diff --git a/src/regex.h b/src/regex.h index 34c9929..1d439de 100644 --- a/src/regex.h +++ b/src/regex.h @@ -186,7 +186,12 @@ #endif /* Roughly the maximum number of failure points on the stack. */ -extern size_t re_max_failures; +extern size_t emacs_re_max_failures; + +#ifdef emacs +/* Amount of memory that we can safely stack allocate. */ +extern ptrdiff_t emacs_re_safe_alloca; +#endif /* Define combinations of the above bits for the standard possibilities. -- 2.9.3 --=-=-=--