From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Paul Eggert <eggert@cs.ucla.edu>
Newsgroups: gmane.emacs.bugs
Subject: bug#24206: 25.1;
	Curly quotes generate invalid strings, leading to a segfault
Date: Thu, 18 Aug 2016 11:33:36 -0700
Organization: UCLA Computer Science Department
Message-ID: <7296d150-48aa-0c39-9421-63fffa499d2e@cs.ucla.edu>
References: <f1b9c0f6-39e3-2237-8f52-cc50805efbfb@cs.ucla.edu>
	<8337m7h1dp.fsf@gnu.org>
	<f04b0d34-c13b-bbfe-9542-5c052676271c@cs.ucla.edu>
	<83zioffew5.fsf@gnu.org>
	<b98070aa-10db-dfb9-a698-2786caf478cd@cs.ucla.edu>
	<83popaf1yz.fsf@gnu.org> <87bn0u3rqc.fsf@linux-m68k.org>
	<83mvkdg91i.fsf@gnu.org>
	<8b78f23f-4a4f-e568-b760-3350ca7bb8d3@cs.ucla.edu>
	<83d1l8g3zs.fsf@gnu.org>
	<4822bfeb-c507-a9ff-93bc-1d27ba93b9d7@cs.ucla.edu>
	<8360qzfmz3.fsf@gnu.org>
	<11031c1e-c784-0ba2-4b6c-4fab0cb92354@cs.ucla.edu>
	<83tweje0ct.fsf@gnu.org>
	<b09bc143-5ae2-d5bc-c7a3-82aa5eca0628@cs.ucla.edu>
	<83lgzudu97.fsf@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Trace: blaine.gmane.org 1471545323 11888 195.159.176.226 (18 Aug 2016 18:35:23 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Thu, 18 Aug 2016 18:35:23 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
	Thunderbird/45.2.0
Cc: johnw@gnu.org, 24206@debbugs.gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 18 20:35:19 2016
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1baS9u-0002pn-C6
	for geb-bug-gnu-emacs@m.gmane.org; Thu, 18 Aug 2016 20:35:18 +0200
Original-Received: from localhost ([::1]:53923 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1baS9r-0002hJ-Fv
	for geb-bug-gnu-emacs@m.gmane.org; Thu, 18 Aug 2016 14:35:15 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58323)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1baS8l-000250-Jw
	for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 14:34:08 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1baS8g-0000FB-R5
	for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 14:34:06 -0400
Original-Received: from debbugs.gnu.org ([208.118.235.43]:35902)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1baS8g-0000F7-NV
	for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 14:34:02 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1baS8g-0002xl-GZ
	for bug-gnu-emacs@gnu.org; Thu, 18 Aug 2016 14:34:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Paul Eggert <eggert@cs.ucla.edu>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Thu, 18 Aug 2016 18:34:02 +0000
Resent-Message-ID: <handler.24206.B24206.147154523111373@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 24206
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 24206-submit@debbugs.gnu.org id=B24206.147154523111373
	(code B ref 24206); Thu, 18 Aug 2016 18:34:02 +0000
Original-Received: (at 24206) by debbugs.gnu.org; 18 Aug 2016 18:33:51 +0000
Original-Received: from localhost ([127.0.0.1]:33614 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1baS8U-0002xM-VV
	for submit@debbugs.gnu.org; Thu, 18 Aug 2016 14:33:51 -0400
Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57967)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <eggert@cs.ucla.edu>) id 1baS8T-0002x9-0e
	for 24206@debbugs.gnu.org; Thu, 18 Aug 2016 14:33:49 -0400
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3249C160E3C;
	Thu, 18 Aug 2016 11:33:42 -0700 (PDT)
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id 3ZQ6k1wtxh84; Thu, 18 Aug 2016 11:33:41 -0700 (PDT)
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 31E6E160E3D;
	Thu, 18 Aug 2016 11:33:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id jQkRogaiGbrw; Thu, 18 Aug 2016 11:33:41 -0700 (PDT)
Original-Received: from [192.168.1.9] (unknown [100.32.155.148])
	by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 11CC0160E3C;
	Thu, 18 Aug 2016 11:33:41 -0700 (PDT)
In-Reply-To: <83lgzudu97.fsf@gnu.org>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:122374
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/122374>

Eli Zaretskii wrote:

> The code that got removed was the easy and intuitive part: it dealt
> with processing single-byte strings one byte at a time.  The
> hard-to-read part of the code is still with us.  We have less 'if'
> conditionals, but that's hardly the main complication in the original
> code.

Sure, but removing unnecessary easy stuff lets the reader see the hard st=
uff=20
more clearly.

> You are missing my point: the code on master now processes a string,
> that could be either unibyte or multibyte, using only multibyte
> methods.  With the flag in place, each kind of string would have used
> the method that's natural with it.  The way things are now, one has to
> think hard about what the code does to convince oneself it's valid.

The way things were before it was even harder, because one had to worry n=
ot only=20
about processing Emacs-encoded text, one also had to worry about processi=
ng=20
unibyte text containing non-ASCII bytes. The code is simpler now, because=
 it=20
needs only to process Emacs-encoded text.

The old code might have flown despite its problems, if all the input data=
 were=20
consistent (i.e., either all unibyte, or all with Emacs-encoded text). Bu=
t=20
inputs need not be consistent, so the old approach simply did not work.

As I take it, your principal objection to the new code is not to its inte=
rnals:=20
it's that substitute-command-keys can now return a multibyte string even =
when=20
all the input data is unibyte. I don't think that's a big deal, but if th=
is is=20
the primary reason for our lengthy conversation, I can move things forwar=
d by=20
changing the code so that it instead returns a unibyte string when all th=
e input=20
data are unibyte. Would that suffice?

>>     I don't see why it is tricky, we do that in Emacs in other places.
>>
>> Really? A call to STRING_CHAR_AND_LENGTH followed by a length test fol=
lowed by a call to memcpy for length > 1 and a special case inline copy f=
or length =3D=3D 1? When copying multibyte data? Where else does Emacs do=
 that?
>
> What exactly confuses you in that snippet?

Nothing confuses me in that snippet. I know what the snippet does, now th=
at I've=20
read and understood it. It is a longwinded and unnecessarily tricky way o=
f doing=20
something simple.

> The call to
> STRING_CHAR_AND_LENGTH itself? we have that in umpteen other places.
> The single-byte optimization of not calling memcpy?  That's standard
> practice in C.

I'm not talking about each individual line of code in that snippet. I am =
talking=20
about the entire construction. A reader must look at all 14 lines to dedu=
ce what=20
it does, deduce that it's unnecessarily complicated, and deduce that the=20
unnecessary complication is not a sign that something unusual is going on=
. No=20
place else in Emacs has this construction.

> If you need an example for using
> STRING_CHAR_AND_LENGTH while copying text, you can find it in
> copy_text, for example.

copy_text does something quite different. When copying multibyte text, it=
 does a=20
single memcpy for the entire string. copy_text does not call memcpy for e=
ach=20
multibyte character, and nothing in copy_text is particularly close to th=
e=20
snippet in question.

> you replaced it with a
> fall-through, which is harder to follow and easier to introduce subtle
> bugs.

True, but the fall-through is clearly marked in comments. The new way is =
a bit=20
more efficent (smaller code space, more likely to fit in cache); the old =
way=20
duplicated character-copying code, a practice that also introduces subtle=
 bugs=20
-- in addition to temporarily mystifying the reader because the old code=20
duplicates were not identical (itself a sign of cruft). With all that in =
mind,=20
it was reasonable to switch from the old way to the new, despite the=20
disadvantage you mention.

>>         Alan wanted something that he could put into his .emacs that w=
ould cause
>>         (message PERCENTLESS) to output the string PERCENTLESS as-is, =
assuming
>>         PERCENTLESS lacks %. This was the point of his original bug re=
port; his original
>>         example involved ` and ' but he wanted the same behavior for =E2=
=80=98 and =E2=80=99, a point
>>         that became clear during the discussion of Bug#23425.
>>
>>     Then why not for '..' as well?  How is that different from =E2=80=98=
..=E2=80=99?
>>
>> It's not different. Alan wanted the same behavior for '..', and he got=
 that too.
>
> But the behavior is not the same:

I was referring to Alan's desire to treat all quotes the same (i.e., to n=
ot=20
substitute for any of them), which is now supported by setting=20
text-quoting-style to grave.

>   (let ((text-quoting-style 'curve))
>     (substitute-command-keys "'foo'"))
>       =3D> =E2=80=99foo=E2=80=99
>
> but
>
>   (let ((text-quoting-style 'grave))
>     (substitute-command-keys "=E2=80=98foo=E2=80=99"))
>       =3D> =E2=80=98foo=E2=80=99
>
> I would have expected the first example to yield 'foo'

No, substitute-command-keys works on each grave accent and apostrophe=20
separately, without looking at the others. As I recall it's worked that w=
ay and=20
has been documented that way, in both master and emacs-25, ever since the=
=20
feature was installed. One could posit a "smarter" form of substitution, =
which=20
leaves 'foo' alone but which translates `foo'. Although we considered tha=
t=20
possibility during design, we rejected it because it is more complicated =
and has=20
more problems and quirks that are a pain to document and would surprise u=
sers in=20
other ways.