From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Mark H Weaver <mhw@netris.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Emacs Lisp's future
Date: Mon, 06 Oct 2014 12:27:35 -0400
Message-ID: <871tql17uw.fsf@yeeloong.lan>
References: <54193A70.9020901@member.fsf.org>
	<87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru>
	<83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp>
	<837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org>
	<jwvoau19n3n.fsf-monnier+emacs@gnu.org>
	<87lhp5m99w.fsf@fencepost.gnu.org>
	<jwviok99jki.fsf-monnier+emacs@gnu.org>
	<87h9ztm5oa.fsf@fencepost.gnu.org>
	<jwvd2ah9hve.fsf-monnier+emacs@gnu.org>
	<87d2ahm3nw.fsf@fencepost.gnu.org>
	<jwv1tqx9ea3.fsf-monnier+emacs@gnu.org>
	<E1XYNnY-0005Zo-Kz@fencepost.gnu.org> <871tqneyvl.fsf@netris.org>
	<E1XatgY-00062K-7y@fencepost.gnu.org> <87d2a54t1m.fsf@yeeloong.lan>
	<83lhotme1e.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1412612916 17124 80.91.229.3 (6 Oct 2014 16:28:36 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 6 Oct 2014 16:28:36 +0000 (UTC)
Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org,
	handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 18:28:28 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XbB9A-0000AJ-6I
	for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 18:28:28 +0200
Original-Received: from localhost ([::1]:52972 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XbB99-0006M1-Qi
	for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 12:28:27 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48632)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mhw@netris.org>) id 1XbB92-0006LU-Ld
	for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:28:25 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mhw@netris.org>) id 1XbB8x-0003dk-Kp
	for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:28:20 -0400
Original-Received: from world.peace.net ([96.39.62.75]:56419)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mhw@netris.org>)
	id 1XbB8p-0003KT-V4; Mon, 06 Oct 2014 12:28:08 -0400
Original-Received: from c-24-62-95-23.hsd1.ma.comcast.net ([24.62.95.23]
	helo=yeeloong.lan)
	by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.72) (envelope-from <mhw@netris.org>)
	id 1XbB8g-0005Ju-94; Mon, 06 Oct 2014 12:27:58 -0400
In-Reply-To: <83lhotme1e.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Oct
	2014 18:08:29 +0300")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 96.39.62.75
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:175021
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/175021>

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Mark H Weaver <mhw@netris.org>
>> Cc: monnier@iro.umontreal.ca, dak@gnu.org, dmantipov@yandex.ru,
>> emacs-devel@gnu.org, handa@gnu.org, eliz@gnu.org, stephen@xemacs.org
>> Date: Mon, 06 Oct 2014 02:21:41 -0400
>> 
>> A related problem has to do with the fact that naively implemented UTF-8
>> allows code points to be represented with more bytes than are actually
>> needed, essentially by padding the code point with leading zeroes and
>> then encoding with UTF-8 as if the high bits were non-zero.  For
>> example, the ASCII quote (") can be represented as the single byte 0x22,
>> the two byte sequence 0xC0 0xA2, etc.
>> 
>> UTF-8 decoders are supposed to detect and reject these "overlong"
>> encodings, but it is likely that many programs fail to do this.  Such
>> programs are usually vulnerable to these overlong encodings when trying
>> to detect special characters (e.g. for quoting/escaping) or when
>> validating inputs.
>> 
>> To cope with this, the Unicode standards require that UTF-8 codecs
>> reject overlong encodings and other invalid byte sequences.  This is in
>> direct conflict with the idea of "raw byte" code points, whose purpose
>> is to be tolerant of arbitrary byte sequences and to propagate them
>> unchanged.
>
> The obvious solution is to encode the raw bytes internally in a UTF-8
> compatible way.  Which is what Emacs does in its buffers and strings,
> as I'm sure you know.  Can't Guile do something similar?

I'm afraid you've misunderstood, or perhaps I've failed to explain it
clearly.

It doesn't matter how these raw bytes are encoded internally.  No matter
what mechanism we use to accomplish it, propagating invalid byte
sequences by default is bad security policy.  It has the effect of
exposing all internal subsystems to malformed UTF-8 such as overlong
encodings unless users take explicit steps to check for them and remove
them.  This is a recipe for security holes.

The Unicode standard requires that all UTF-8 codecs refuse to accept,
produce, or propagate invalid byte sequences, including the troublesome
overlong encodings.  I'm not one for blindly following standards, but in
my opinion this is the default policy we should adopt.

Editing files is an unusual case.  Of course, we want users to be able
to edit a file with coding errors, and to leave any part of the file
untouched by the user exactly as it was.  Anything else would be a
mistake.

However, I would argue that even in Emacs, string<->bytevector
conversions should be strict by default, so that other uses of them
(e.g. communication over sockets, pipes, and encoding of command-line
arguments to subprocess) should be strict by default.  Even if you
disagree, I'd like the strict mode to remain the default in Guile.

      Mark