From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Date: Wed, 17 Aug 2016 10:41:52 -0700 Organization: UCLA Computer Science Department Message-ID: <11031c1e-c784-0ba2-4b6c-4fab0cb92354@cs.ucla.edu> References: <8337m7h1dp.fsf@gnu.org> <83zioffew5.fsf@gnu.org> <83popaf1yz.fsf@gnu.org> <87bn0u3rqc.fsf@linux-m68k.org> <83mvkdg91i.fsf@gnu.org> <8b78f23f-4a4f-e568-b760-3350ca7bb8d3@cs.ucla.edu> <83d1l8g3zs.fsf@gnu.org> <4822bfeb-c507-a9ff-93bc-1d27ba93b9d7@cs.ucla.edu> <8360qzfmz3.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------B82A6C0CA66137FDC9048D00" X-Trace: blaine.gmane.org 1471455811 16489 195.159.176.226 (17 Aug 2016 17:43:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 17 Aug 2016 17:43:31 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Cc: johnw@gnu.org, 24206@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Aug 17 19:43:27 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba4s8-0003tY-4B for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 19:43:24 +0200 Original-Received: from localhost ([::1]:48922 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba4rx-0003aT-B3 for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 13:43:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52472) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba4rq-0003aL-Fg for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 13:43:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba4rl-0004MB-Ub for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 13:43:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34545) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba4rl-0004M6-RK for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 13:43:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ba4rl-0006rV-M2 for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 13:43:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 17 Aug 2016 17:43:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24206 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24206-submit@debbugs.gnu.org id=B24206.147145572926312 (code B ref 24206); Wed, 17 Aug 2016 17:43:01 +0000 Original-Received: (at 24206) by debbugs.gnu.org; 17 Aug 2016 17:42:09 +0000 Original-Received: from localhost ([127.0.0.1]:60490 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba4qu-0006qK-H5 for submit@debbugs.gnu.org; Wed, 17 Aug 2016 13:42:08 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:58811) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba4qs-0006ph-5P for 24206@debbugs.gnu.org; Wed, 17 Aug 2016 13:42:06 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 50165161125; Wed, 17 Aug 2016 10:42:00 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id iRojNpDdD23m; Wed, 17 Aug 2016 10:41:59 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 34BD4161199; Wed, 17 Aug 2016 10:41:59 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id DqeukgDDrOAb; Wed, 17 Aug 2016 10:41:59 -0700 (PDT) Original-Received: from [192.168.1.9] (unknown [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1055C161125; Wed, 17 Aug 2016 10:41:59 -0700 (PDT) In-Reply-To: <8360qzfmz3.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122326 Archived-At: This is a multi-part message in MIME format. --------------B82A6C0CA66137FDC9048D00 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Eli Zaretskii wrote: >> The changes were motivated by bug fixes, not style. > > That's not what I see. E.g., this hunk simply replaces valid code by > an equivalently valid code: > > - if (multibyte) > - { > - int len; > - > - STRING_CHAR_AND_LENGTH (strp, len); > - if (len =3D=3D 1) > - *bufp =3D *strp; > - else > - memcpy (bufp, strp, len); > - strp +=3D len; > - bufp +=3D len; > - nchars++; > - } > - else > - *bufp++ =3D *strp++, nchars++; > + /* Fall through to copy one char. */ Some change in this area was needed because the 'multibyte' flag went awa= y.=20 While doing that, I noticed that discarding all the code made this=20 somewhat-tricky area easier to follow. It's not merely that the old multi= byte=20 code is unnecessarily long and hard to follow; it's that the old code doe= s=20 something fairly-typical (copy a multibyte character) in an unusual way, = which=20 is too likely to lead the reader into incorrectly thinking that there is=20 something actually unusual about the action. Misleading code like this re= ally=20 cries out to be rewritten, particularly if the rewriting simply ionvolves= =20 deleting it. In short, the main motivation here was clarity, not merely style. (I hope I don't have to go into such details to defend every code change = I=20 install! I'm finding it difficult-enough now to find time to improve Emac= s.) > Same here: > > - else if (strp[0] =3D=3D '\\' && strp[1] =3D=3D '[') > + else if (strp[0] =3D=3D '\\' && strp[1] =3D=3D '[' > + && (close_bracket > + =3D memchr (strp + 2, ']', > + SDATA (str) + strbytes - (strp + 2)))) > { > - ptrdiff_t start_idx; > bool follow_remap =3D 1; > > - strp +=3D 2; /* skip \[ */ > - start =3D strp; > - start_idx =3D start - SDATA (string); > - > - while ((strp - SDATA (string) > - < SBYTES (string)) > - && *strp !=3D ']') > - strp++; > - length_byte =3D strp - start; > - > - strp++; /* skip ] */ This one is not merely a style change. The old code matched \[ even if no= t=20 followed by ], the new code does not. This is an intended improvement. I = plead=20 guilty to the charge that the new code is also shorter and clearer. > and here (which, for some reason, loses part of a comment, and IMO > makes it half a riddle for the uninitiated): > > - /* Note the Fwhere_is_internal can GC, so we have to take > - relocation of string contents into account. */ > - strp =3D SDATA (string) + idx; > - start =3D SDATA (string) + start_idx; > + /* Take relocation of string contents into account. */ > + strp =3D SDATA (str) + idx; > + start =3D strp - length_byte - 1; The new comment came because I copied it from somewhere else in the inter= est of=20 consistency. You're right, I omitted some commentary in the process. I th= ought=20 the omitted info obvious, but evidently you think otherwise. It's obvious= ly no=20 big deal, so I brought it back by applying the attached patch to master. > What code generated bogus null bytes? For example, (substitute-command-keys "\\=3D") generated "\0". > I'm not saying it isn't fine to make such changes, I'm urging you and > the others to resist the temptation of doing so unless really > necessary. We are operating in the area of diminishing returns, and > too many times introduce regressions into code that was working > properly for decades. This particular code has been buggy for decades in unusual areas. There i= s no=20 harm in simplifying it when fixing the bugs. On the contrary, we should=20 encourage bug fixes that simplify code. > Where's the O(N**2) performance When the buffer grew slightly, it was reallocated to be slightly bigger a= nd the=20 old data was copied to the new; this is an O(N**2) algorithm, where N is = the=20 final buffer size. The new approach doubles the buffer size instead (actu= ally,=20 multiplies it by 1.5, but that's good enough to bring worst-case behavior= down=20 to O(N)). This sort of thing is standard programming practice when growin= g a=20 buffer whose eventual size is not yet known. > and why does performance matter in this function anyway? It usually doesn't, but it might in the worst case, so I figured I might = as well=20 fix the O(N**2) problem while I was fixing related bugs. This is a good t= hing to=20 do in master. > Unlike at that time, I now think > this was a bad move, because Emacs 25.1 will have the disabled > conversion in it, so by the time we release the code in master, it > would be an incompatible change. If that's the main objection, then let's change Emacs 25 to behave simila= rly.=20 This would be a simple and conservative change to Emacs 25. But even if y= ou=20 don't want to change Emacs 25 (and thus you want to Emacs 25 to continue = to be=20 less-compatible with Emacs 24), it's OK to change this minor detail back = to the=20 way Emacs 24 does things. > (I also don't see how it is related to the > original bug report, which AFAIU was about (message "`foo'") that > still behaves as in the bug report.) Alan wanted something that he could put into his .emacs that would cause=20 (message PERCENTLESS) to output the string PERCENTLESS as-is, assuming=20 PERCENTLESS lacks %. This was the point of his original bug report; his o= riginal=20 example involved ` and ' but he wanted the same behavior for =E2=80=98 an= d =E2=80=99, a point=20 that became clear during the discussion of Bug#23425. In Message #95 of t= hat bug=20 report I proposed the change in question, and in Message #104 you said it= =20 sounded good to you. This is a contentious area, and unless there's good reason I'd rather let= =20 sleeping dogs lie and stick with master's current behavior here. > (Mumbles something about Emacs maintenance being a lonely business...) But we have all these nice conversations! :-) --------------B82A6C0CA66137FDC9048D00 Content-Type: text/x-diff; name="0001-src-doc.c-Fsubstitute_command_keys-Clarify-GC-commen.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-src-doc.c-Fsubstitute_command_keys-Clarify-GC-commen.pa"; filename*1="tch" =46rom 70a5e67e9a072b6f22343fc0c7eed91dfdaf8025 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 17 Aug 2016 10:15:53 -0700 Subject: [PATCH] * src/doc.c (Fsubstitute_command_keys): Clarify GC comme= nts. --- src/doc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/doc.c b/src/doc.c index 4b91831..6376398 100644 --- a/src/doc.c +++ b/src/doc.c @@ -821,7 +821,8 @@ Otherwise, return a new string. */) goto do_remap; } =20 - /* Take relocation of string contents into account. */ + /* Fwhere_is_internal can GC, so take relocation of string + contents into account. */ strp =3D SDATA (str) + idx; start =3D strp - length_byte - 1; =20 @@ -936,7 +937,8 @@ Otherwise, return a new string. */) bufp +=3D length_byte; nchars +=3D length; =20 - /* Take relocation of string contents into account. */ + /* Some of the previous code can GC, so take relocation of + string contents into account. */ strp =3D SDATA (str) + idx; =20 continue; --=20 2.5.5 --------------B82A6C0CA66137FDC9048D00--