From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault Date: Thu, 18 Aug 2016 17:30:44 +0300 Message-ID: <83lgzudu97.fsf@gnu.org> References: <8337m7h1dp.fsf@gnu.org> <83zioffew5.fsf@gnu.org> <83popaf1yz.fsf@gnu.org> <87bn0u3rqc.fsf@linux-m68k.org> <83mvkdg91i.fsf@gnu.org> <8b78f23f-4a4f-e568-b760-3350ca7bb8d3@cs.ucla.edu> <83d1l8g3zs.fsf@gnu.org> <4822bfeb-c507-a9ff-93bc-1d27ba93b9d7@cs.ucla.edu> <8360qzfmz3.fsf@gnu.org> <11031c1e-c784-0ba2-4b6c-4fab0cb92354@cs.ucla.edu> <83tweje0ct.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1471530681 4534 195.159.176.226 (18 Aug 2016 14:31:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 18 Aug 2016 14:31:21 +0000 (UTC) Cc: johnw@gnu.org, 24206@debbugs.gnu.org To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 18 16:31:17 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1baOLk-0000sO-TZ for geb-bug-gnu-emacs@m.gmane.org; Thu, 18 Aug 2016 16:31:17 +0200 Original-Received: from localhost ([::1]:52998 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1baOLj-0001Bb-4b for geb-bug-gnu-emacs@m.gmane.org; Thu, 18 Aug 2016 10:31:15 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34055) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1baOLa-00019S-EY for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 10:31:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1baOLW-0007Kj-5O for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 10:31:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:35791) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1baOLW-0007Ke-1e for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 10:31:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1baOLV-0003wd-TE for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 10:31:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 18 Aug 2016 14:31:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24206 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24206-submit@debbugs.gnu.org id=B24206.147153065315145 (code B ref 24206); Thu, 18 Aug 2016 14:31:01 +0000 Original-Received: (at 24206) by debbugs.gnu.org; 18 Aug 2016 14:30:53 +0000 Original-Received: from localhost ([127.0.0.1]:33503 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1baOLN-0003wC-23 for submit@debbugs.gnu.org; Thu, 18 Aug 2016 10:30:53 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:42944) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1baOLL-0003w0-47 for 24206@debbugs.gnu.org; Thu, 18 Aug 2016 10:30:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1baOLE-0007H5-TC for 24206@debbugs.gnu.org; Thu, 18 Aug 2016 10:30:45 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:60550) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1baOLA-0007GP-GK; Thu, 18 Aug 2016 10:30:40 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2377 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1baOL9-0002We-Kq; Thu, 18 Aug 2016 10:30:40 -0400 In-reply-to: (message from Paul Eggert on Wed, 17 Aug 2016 13:52:42 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122365 Archived-At: > Cc: johnw@gnu.org, 24206@debbugs.gnu.org > From: Paul Eggert > Date: Wed, 17 Aug 2016 13:52:42 -0700 > > Some change in this area was needed because the 'multibyte' flag went away. > > Only because you removed it. You could have left it alone, it would > have worked > > Sure, but it was no longer necessary, as the code no longer needs to record whether the original string was multibyte. Keeping an unnecessary variable around would make the code harder to read. The code that got removed was the easy and intuitive part: it dealt with processing single-byte strings one byte at a time. The hard-to-read part of the code is still with us. We have less 'if' conditionals, but that's hardly the main complication in the original code. > even after the call to Fstring_make_multibyte, for the > reasons I explained earlier: the result is not necessarily a multibyte > string. > > That doesn't affect the fact that the 'multibyte' variable is no longer necessary. In emacs-25, 'multibyte' does not mean that the result is a multibyte string You are missing my point: the code on master now processes a string, that could be either unibyte or multibyte, using only multibyte methods. With the flag in place, each kind of string would have used the method that's natural with it. The way things are now, one has to think hard about what the code does to convince oneself it's valid. > I don't see why it is tricky, we do that in Emacs in other places. > > Really? A call to STRING_CHAR_AND_LENGTH followed by a length test followed by a call to memcpy for length > 1 and a special case inline copy for length == 1? When copying multibyte data? Where else does Emacs do that? What exactly confuses you in that snippet? The call to STRING_CHAR_AND_LENGTH itself? we have that in umpteen other places. The single-byte optimization of not calling memcpy? That's standard practice in C. If you need an example for using STRING_CHAR_AND_LENGTH while copying text, you can find it in copy_text, for example. I really don't understand what's your problem with that code. > it's more clear for you > > Replacing 14 unusually and unnecessarily tricky lines with zero lines should help clarify things for most readers. They are not unusually tricky at all. And you replaced it with a fall-through, which is harder to follow and easier to introduce subtle bugs. > I could simply revert your commit, it would have saved us both quite > some time. Would you prefer that? > > It'd be even simpler to leave things alone, as the master code works better than emacs-25 does. Sorry, leaving alone changes that I find questionable or gratuitous is not in the job description. > Alan wanted something that he could put into his .emacs that would cause > (message PERCENTLESS) to output the string PERCENTLESS as-is, assuming > PERCENTLESS lacks %. This was the point of his original bug report; his original > example involved ` and ' but he wanted the same behavior for ‘ and ’, a point > that became clear during the discussion of Bug#23425. > > Then why not for '..' as well? How is that different from ‘..’? > > It's not different. Alan wanted the same behavior for '..', and he got that too. But the behavior is not the same: (let ((text-quoting-style 'curve)) (substitute-command-keys "'foo'")) => ’foo’ but (let ((text-quoting-style 'grave)) (substitute-command-keys "‘foo’")) => ‘foo’ I would have expected the first example to yield 'foo', i.e. leave the apostrophes alone, as we do with curved quotes in the second example. What we have now is inconsistent, and its rationale evades me.