From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#19208: replace-match unhelpful error message Date: Sat, 3 Aug 2019 13:22:26 -0700 Organization: UCLA Computer Science Department Message-ID: References: <87r3wn5kgw.fsf@newcastle.ac.uk> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------8CEEC21914ECC17B7ED14FD6" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="61442"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 Cc: 19208@debbugs.gnu.org, Phillip Lord To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Aug 03 22:23:10 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hu0YY-000Fpy-5Y for geb-bug-gnu-emacs@m.gmane.org; Sat, 03 Aug 2019 22:23:10 +0200 Original-Received: from localhost ([::1]:41846 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hu0YW-0006GQ-Qz for geb-bug-gnu-emacs@m.gmane.org; Sat, 03 Aug 2019 16:23:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45559) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hu0YS-0006GK-G4 for bug-gnu-emacs@gnu.org; Sat, 03 Aug 2019 16:23:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hu0YQ-0005Ss-SV for bug-gnu-emacs@gnu.org; Sat, 03 Aug 2019 16:23:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:51195) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hu0YQ-0005Se-NJ for bug-gnu-emacs@gnu.org; Sat, 03 Aug 2019 16:23:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hu0YQ-0008Vs-Hm for bug-gnu-emacs@gnu.org; Sat, 03 Aug 2019 16:23:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87r3wn5kgw.fsf@newcastle.ac.uk> Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 03 Aug 2019 20:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19208 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: fixed Original-Received: via spool by 19208-submit@debbugs.gnu.org id=B19208.156486376032678 (code B ref 19208); Sat, 03 Aug 2019 20:23:02 +0000 Original-Received: (at 19208) by debbugs.gnu.org; 3 Aug 2019 20:22:40 +0000 Original-Received: from localhost ([127.0.0.1]:60015 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hu0Y4-0008V0-AA for submit@debbugs.gnu.org; Sat, 03 Aug 2019 16:22:40 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36744) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hu0Y1-0008Ul-Je for 19208@debbugs.gnu.org; Sat, 03 Aug 2019 16:22:38 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7D8791618B1; Sat, 3 Aug 2019 13:22:31 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DQiCEvab3kK3; Sat, 3 Aug 2019 13:22:30 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 48CFE162639; Sat, 3 Aug 2019 13:22:30 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id datY1iTFUlwV; Sat, 3 Aug 2019 13:22:30 -0700 (PDT) Original-Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 0F62A1618B1; Sat, 3 Aug 2019 13:22:30 -0700 (PDT) Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:164486 Archived-At: This is a multi-part message in MIME format. --------------8CEEC21914ECC17B7ED14FD6 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit > No, that's just another sanity check -- num_regs is the max allowed > number of sub-matches. (I've now added some comments to clarify.) Unfortunately that patch messes up the sanity check, as the patched code allows 'sub' to be negative, or to be equal to search_regs.num_regs, and in either case this results in a bad pointer. > Yoda conditionals and a !... Actually those were Leibniz conditionals, which are comparisons involving "<" or "<=". The idea is that the conditionals' textual order reflects numeric order. This is a common style in math when doing range checking, e.g., "0 <= i < n". Yoda conditionals are expressions like "0 != x" which I agree are confusing. I learned Leibniz conditionals from the late Val Schorre, who was *really* good at programming and programming style: Knuth credits Schorre with the invention of goto-less programming in the early 1960s. When I redid your patch as a Leibniz conditional, I instantly spotted the bug that it introduced. You might give Leibniz conditionals a try, as they help make code more readable and reliable when checking for range errors or overflow. While in the neighborhood I spotted some other rare possibilities for bad behavior due to out-of-range indexes. I fixed the bugs I found by installing the attached. --------------8CEEC21914ECC17B7ED14FD6 Content-Type: text/x-patch; name="0001-Fix-rare-undefined-behaviors-in-replace-match.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Fix-rare-undefined-behaviors-in-replace-match.patch" >From 13fe8a27042b1539d43727e6df97c386c61c3053 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 3 Aug 2019 12:45:19 -0700 Subject: [PATCH] Fix rare undefined behaviors in replace-match * src/search.c (Freplace_match): Simplify by caching search_regs components. Fix sanity check for out-of-range subscripts; it incorrectly allowed negative subscripts, subscripts equal to search_regs.num_regs, and it had undefined behavior for subscripts outside ptrdiff_t range. Improve wording of newly-introduced replace-match diagnostic. Rework use of opoint, to avoid setting point to an out-of-range value in rare cases involving modification hooks. --- src/search.c | 107 ++++++++++++++++++--------------------------------- 1 file changed, 38 insertions(+), 69 deletions(-) diff --git a/src/search.c b/src/search.c index 0e2ae059e8..9b674a5810 100644 --- a/src/search.c +++ b/src/search.c @@ -2389,44 +2389,32 @@ since only regular expressions have distinguished subexpressions. */) case_action = nochange; /* We tried an initialization */ /* but some C compilers blew it */ - if (search_regs.num_regs <= 0) + ptrdiff_t num_regs = search_regs.num_regs; + if (num_regs <= 0) error ("`replace-match' called before any match found"); if (NILP (subexp)) sub = 0; else { - CHECK_FIXNUM (subexp); + CHECK_RANGED_INTEGER (subexp, 0, num_regs - 1); sub = XFIXNUM (subexp); - /* Sanity check to see whether the subexp is larger than the - allowed number of sub-regexps. */ - if (sub >= 0 && sub > search_regs.num_regs) - args_out_of_range (subexp, make_fixnum (search_regs.num_regs)); } - /* Check whether the subexpression to replace is greater than the - number of subexpressions in the regexp. */ - if (sub > 0 && search_regs.start[sub] == -1) - args_out_of_range (build_string ("Attempt to replace regexp subexpression that doesn't exist"), - subexp); + ptrdiff_t sub_start = search_regs.start[sub]; + ptrdiff_t sub_end = search_regs.end[sub]; + eassert (sub_start <= sub_end); - /* Sanity check to see whether the text to replace is present in the - buffer/string. */ - if (NILP (string)) + /* Check whether the text to replace is present in the buffer/string. */ + if (! (NILP (string) + ? BEGV <= sub_start && sub_end <= ZV + : 0 <= sub_start && sub_end <= SCHARS (string))) { - if (search_regs.start[sub] < BEGV - || search_regs.start[sub] > search_regs.end[sub] - || search_regs.end[sub] > ZV) - args_out_of_range (make_fixnum (search_regs.start[sub]), - make_fixnum (search_regs.end[sub])); - } - else - { - if (search_regs.start[sub] < 0 - || search_regs.start[sub] > search_regs.end[sub] - || search_regs.end[sub] > SCHARS (string)) - args_out_of_range (make_fixnum (search_regs.start[sub]), - make_fixnum (search_regs.end[sub])); + if (sub_start < 0) + xsignal2 (Qerror, + build_string ("replace-match subexpression does not exist"), + subexp); + args_out_of_range (make_fixnum (sub_start), make_fixnum (sub_end)); } if (NILP (fixedcase)) @@ -2434,8 +2422,8 @@ since only regular expressions have distinguished subexpressions. */) /* Decide how to casify by examining the matched text. */ ptrdiff_t last; - pos = search_regs.start[sub]; - last = search_regs.end[sub]; + pos = sub_start; + last = sub_end; if (NILP (string)) pos_byte = CHAR_TO_BYTE (pos); @@ -2511,9 +2499,8 @@ since only regular expressions have distinguished subexpressions. */) { Lisp_Object before, after; - before = Fsubstring (string, make_fixnum (0), - make_fixnum (search_regs.start[sub])); - after = Fsubstring (string, make_fixnum (search_regs.end[sub]), Qnil); + before = Fsubstring (string, make_fixnum (0), make_fixnum (sub_start)); + after = Fsubstring (string, make_fixnum (sub_end), Qnil); /* Substitute parts of the match into NEWTEXT if desired. */ @@ -2542,12 +2529,12 @@ since only regular expressions have distinguished subexpressions. */) if (c == '&') { - substart = search_regs.start[sub]; - subend = search_regs.end[sub]; + substart = sub_start; + subend = sub_end; } else if (c >= '1' && c <= '9') { - if (c - '0' < search_regs.num_regs + if (c - '0' < num_regs && search_regs.start[c - '0'] >= 0) { substart = search_regs.start[c - '0']; @@ -2612,13 +2599,8 @@ since only regular expressions have distinguished subexpressions. */) return concat3 (before, newtext, after); } - /* Record point, then move (quietly) to the start of the match. */ - if (PT >= search_regs.end[sub]) - opoint = PT - ZV; - else if (PT > search_regs.start[sub]) - opoint = search_regs.end[sub] - ZV; - else - opoint = PT; + /* Record point. A nonpositive OPOINT is actually an offset from ZV. */ + opoint = PT <= sub_start ? PT : max (PT, sub_end) - ZV; /* If we want non-literal replacement, perform substitution on the replacement string. */ @@ -2687,7 +2669,7 @@ since only regular expressions have distinguished subexpressions. */) if (c == '&') idx = sub; - else if (c >= '1' && c <= '9' && c - '0' < search_regs.num_regs) + else if ('1' <= c && c <= '9' && c - '0' < num_regs) { if (search_regs.start[c - '0'] >= 1) idx = c - '0'; @@ -2745,25 +2727,11 @@ since only regular expressions have distinguished subexpressions. */) xfree (substed); } - /* The functions below modify the buffer, so they could trigger - various modification hooks (see signal_before_change and - signal_after_change). If these hooks clobber the match data we - error out since otherwise this will result in confusing bugs. */ - ptrdiff_t sub_start = search_regs.start[sub]; - ptrdiff_t sub_end = search_regs.end[sub]; - ptrdiff_t num_regs = search_regs.num_regs; - newpoint = search_regs.start[sub] + SCHARS (newtext); + newpoint = sub_start + SCHARS (newtext); + ptrdiff_t newstart = sub_start == sub_end ? newpoint : sub_start; /* Replace the old text with the new in the cleanest possible way. */ - replace_range (search_regs.start[sub], search_regs.end[sub], - newtext, 1, 0, 1, 1); - /* Update saved data to match adjustment made by replace_range. */ - { - ptrdiff_t change = newpoint - sub_end; - if (sub_start >= sub_end) - sub_start += change; - sub_end += change; - } + replace_range (sub_start, sub_end, newtext, 1, 0, 1, true); if (case_action == all_caps) Fupcase_region (make_fixnum (search_regs.start[sub]), @@ -2773,17 +2741,18 @@ since only regular expressions have distinguished subexpressions. */) Fupcase_initials_region (make_fixnum (search_regs.start[sub]), make_fixnum (newpoint)); - if (search_regs.start[sub] != sub_start - || search_regs.end[sub] != sub_end - || search_regs.num_regs != num_regs) + /* The replace_range etc. functions can trigger modification hooks + (see signal_before_change and signal_after_change). Try to error + out if these hooks clobber the match data since clobbering can + result in confusing bugs. Although this sanity check does not + catch all possible clobberings, it should catch many of them. */ + if (! (search_regs.num_regs == num_regs + && search_regs.start[sub] == newstart + && search_regs.end[sub] == newpoint)) error ("Match data clobbered by buffer modification hooks"); - /* Put point back where it was in the text. */ - if (opoint <= 0) - TEMP_SET_PT (opoint + ZV); - else - TEMP_SET_PT (opoint); - + /* Put point back where it was in the text, if possible. */ + TEMP_SET_PT (clip_to_bounds (BEGV, opoint + (opoint <= 0 ? ZV : 0), ZV)); /* Now move point "officially" to the start of the inserted replacement. */ move_if_not_intangible (newpoint); -- 2.17.1 --------------8CEEC21914ECC17B7ED14FD6--