From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Date: Wed, 17 Aug 2016 18:12:48 +0300 Message-ID: <8360qzfmz3.fsf@gnu.org> References: <8337m7h1dp.fsf@gnu.org> <83zioffew5.fsf@gnu.org> <83popaf1yz.fsf@gnu.org> <87bn0u3rqc.fsf@linux-m68k.org> <83mvkdg91i.fsf@gnu.org> <8b78f23f-4a4f-e568-b760-3350ca7bb8d3@cs.ucla.edu> <83d1l8g3zs.fsf@gnu.org> <4822bfeb-c507-a9ff-93bc-1d27ba93b9d7@cs.ucla.edu> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1471446848 29213 195.159.176.226 (17 Aug 2016 15:14:08 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 17 Aug 2016 15:14:08 +0000 (UTC) Cc: johnw@gnu.org, 24206@debbugs.gnu.org To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Aug 17 17:14:04 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba2Xb-0007OH-CF for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 17:14:03 +0200 Original-Received: from localhost ([::1]:48006 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba2XY-0002Vz-I5 for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 11:14:00 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46182) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba2We-0001xQ-NF for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 11:13:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba2Wc-0006oh-BR for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 11:13:03 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34448) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba2Wc-0006od-7i for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 11:13:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ba2Wc-0003Gy-47 for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 11:13:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 17 Aug 2016 15:13:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24206 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24206-submit@debbugs.gnu.org id=B24206.147144677912572 (code B ref 24206); Wed, 17 Aug 2016 15:13:02 +0000 Original-Received: (at 24206) by debbugs.gnu.org; 17 Aug 2016 15:12:59 +0000 Original-Received: from localhost ([127.0.0.1]:60393 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba2WZ-0003Gh-F9 for submit@debbugs.gnu.org; Wed, 17 Aug 2016 11:12:59 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:55126) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba2WY-0003GV-0W for 24206@debbugs.gnu.org; Wed, 17 Aug 2016 11:12:58 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba2WR-0006lo-Jq for 24206@debbugs.gnu.org; Wed, 17 Aug 2016 11:12:52 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41337) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba2WN-0006kg-Qu; Wed, 17 Aug 2016 11:12:47 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1462 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ba2WL-0004Ut-Uh; Wed, 17 Aug 2016 11:12:46 -0400 In-reply-to: <4822bfeb-c507-a9ff-93bc-1d27ba93b9d7@cs.ucla.edu> (message from Paul Eggert on Tue, 16 Aug 2016 14:07:05 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122315 Archived-At: > Cc: schwab@linux-m68k.org, p.stephani2@gmail.com, johnw@gnu.org, > 24206@debbugs.gnu.org > From: Paul Eggert > Date: Tue, 16 Aug 2016 14:07:05 -0700 > > The changes were motivated by bug fixes, not style. That's not what I see. E.g., this hunk simply replaces valid code by an equivalently valid code: - if (multibyte) - { - int len; - - STRING_CHAR_AND_LENGTH (strp, len); - if (len == 1) - *bufp = *strp; - else - memcpy (bufp, strp, len); - strp += len; - bufp += len; - nchars++; - } - else - *bufp++ = *strp++, nchars++; + /* Fall through to copy one char. */ Same here: - else if (strp[0] == '\\' && strp[1] == '[') + else if (strp[0] == '\\' && strp[1] == '[' + && (close_bracket + = memchr (strp + 2, ']', + SDATA (str) + strbytes - (strp + 2)))) { - ptrdiff_t start_idx; bool follow_remap = 1; - strp += 2; /* skip \[ */ - start = strp; - start_idx = start - SDATA (string); - - while ((strp - SDATA (string) - < SBYTES (string)) - && *strp != ']') - strp++; - length_byte = strp - start; - - strp++; /* skip ] */ and here (which, for some reason, loses part of a comment, and IMO makes it half a riddle for the uninitiated): - /* Note the Fwhere_is_internal can GC, so we have to take - relocation of string contents into account. */ - strp = SDATA (string) + idx; - start = SDATA (string) + start_idx; + /* Take relocation of string contents into account. */ + strp = SDATA (str) + idx; + start = strp - length_byte - 1; etc. etc. -- I see a lot of changes that have nothing to do with the real bugs in this function, they just rearrange valid code, change the way intermediate variables are used, etc. > Although the bugs were mostly minor (e.g., generating bogus NUL bytes due to miscounting) it's fine to fix minor bugs. I did change nearby style (indenting as per GNU style, switching some locals to C99-style decl-after-statement, etc.) but none of the changes were pervasive or were intended for the emacs-25 branch, and it's fine to make such changes in master. What code generated bogus null bytes? I'm not saying it isn't fine to make such changes, I'm urging you and the others to resist the temptation of doing so unless really necessary. We are operating in the area of diminishing returns, and too many times introduce regressions into code that was working properly for decades. We should try to minimize that. Emacs is not supposed to become less stable in core code, unless its gets significant improvements or new features. > One of the bugs was O(N**2) performance when reallocating a temporary buffer. While I was at it, I changed the code to allocate a small temp buffer on the stack to avoid a malloc/free in the usual case, which should be a small win. This accounts for many changes that a quick glance might give the mistaken impression of being stylistic. Where's the O(N**2) performance, and why does performance matter in this function anyway? I don't think we ever had complaints about this being slow. The new code is more complex, because it sometimes uses the stack and sometimes the heap, so more opportunities for bugs in it. I don't see any net gains, sorry. > without any comments as to why we handle the input string as > multibyte for the rest of the function, I think this will confuse > someone down the road. > > OK, I added some comments along those lines (see attached patch). Thanks. > I think the refactoring already introduced a > regression. > > This comment appears to be about changes made in May for Bug#23425, not about code changes I made recently. The May changes were not a regression; they were intended and are documented in etc/NEWS. Alan Mackenzie felt strongly that some changes were needed in this area. See commit 433d366dc7b053048abf710d790ff62421dd1570. Right, sorry I forgot about that. Unlike at that time, I now think this was a bad move, because Emacs 25.1 will have the disabled conversion in it, so by the time we release the code in master, it would be an incompatible change. Also, ‘..’ is left unchanged, but '..' is not, which is inconsistent. So I think that change should be reverted on master. (I also don't see how it is related to the original bug report, which AFAIU was about (message "`foo'") that still behaves as in the bug report.) > maybe we should just go to the original code, after fixing the > immediate bugs, as I proposed in a patch yesterday? > > No, the code in master should be uniformly better than what's in emacs-25. I can't say I see that, sorry. > It would be nice to have good tests for substitute-command-keys, of course. (We can all add this to our lists of things to do. :-) Which seems to be just another way of saying NO, sigh. (Mumbles something about Emacs maintenance being a lonely business...)