From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Emacs Lisp's future
Date: Mon, 06 Oct 2014 18:08:29 +0300
Message-ID: <83lhotme1e.fsf@gnu.org>
References: <54193A70.9020901@member.fsf.org>
	<87lhp6h4zb.fsf@panthera.terpri.org>
	<87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru>
	<83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp>
	<837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org>
	<jwvoau19n3n.fsf-monnier+emacs@gnu.org>
	<87lhp5m99w.fsf@fencepost.gnu.org>
	<jwviok99jki.fsf-monnier+emacs@gnu.org>
	<87h9ztm5oa.fsf@fencepost.gnu.org>
	<jwvd2ah9hve.fsf-monnier+emacs@gnu.org>
	<87d2ahm3nw.fsf@fencepost.gnu.org>
	<jwv1tqx9ea3.fsf-monnier+emacs@gnu.org>
	<E1XYNnY-0005Zo-Kz@fencepost.gnu.org> <871tqneyvl.fsf@netris.org>
	<E1XatgY-00062K-7y@fencepost.gnu.org> <87d2a54t1m.fsf@yeeloong.lan>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
X-Trace: ger.gmane.org 1412608147 16570 80.91.229.3 (6 Oct 2014 15:09:07 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 6 Oct 2014 15:09:07 +0000 (UTC)
Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org,
	handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org
To: Mark H Weaver <mhw@netris.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 17:08:59 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Xb9uF-0004Rr-Gf
	for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 17:08:59 +0200
Original-Received: from localhost ([::1]:52400 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1Xb9uF-0007c3-22
	for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 11:08:59 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58822)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1Xb9tq-0007W7-Km
	for emacs-devel@gnu.org; Mon, 06 Oct 2014 11:08:39 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1Xb9tj-000791-Eu
	for emacs-devel@gnu.org; Mon, 06 Oct 2014 11:08:34 -0400
Original-Received: from mtaout25.012.net.il ([80.179.55.181]:56518)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1Xb9te-00075t-Ej; Mon, 06 Oct 2014 11:08:22 -0400
Original-Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il
	(HyperSendmail v2007.08) id <0ND1003002YBF100@mtaout25.012.net.il>;
	Mon, 06 Oct 2014 18:03:32 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout25.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0ND1002A135V5C20@mtaout25.012.net.il>;
	Mon, 06 Oct 2014 18:03:32 +0300 (IDT)
In-reply-to: <87d2a54t1m.fsf@yeeloong.lan>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 80.179.55.181
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:175016
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/175016>

> From: Mark H Weaver <mhw@netris.org>
> Cc: monnier@iro.umontreal.ca,  dak@gnu.org,  dmantipov@yandex.ru,  emacs-devel@gnu.org,  handa@gnu.org,  eliz@gnu.org,  stephen@xemacs.org
> Date: Mon, 06 Oct 2014 02:21:41 -0400
> 
> A related problem has to do with the fact that naively implemented UTF-8
> allows code points to be represented with more bytes than are actually
> needed, essentially by padding the code point with leading zeroes and
> then encoding with UTF-8 as if the high bits were non-zero.  For
> example, the ASCII quote (") can be represented as the single byte 0x22,
> the two byte sequence 0xC0 0xA2, etc.
> 
> UTF-8 decoders are supposed to detect and reject these "overlong"
> encodings, but it is likely that many programs fail to do this.  Such
> programs are usually vulnerable to these overlong encodings when trying
> to detect special characters (e.g. for quoting/escaping) or when
> validating inputs.
> 
> To cope with this, the Unicode standards require that UTF-8 codecs
> reject overlong encodings and other invalid byte sequences.  This is in
> direct conflict with the idea of "raw byte" code points, whose purpose
> is to be tolerant of arbitrary byte sequences and to propagate them
> unchanged.

The obvious solution is to encode the raw bytes internally in a UTF-8
compatible way.  Which is what Emacs does in its buffers and strings,
as I'm sure you know.  Can't Guile do something similar?

> FWIW, I agree that the Emacs behavior is desirable when editing a file
> that may contain coding errors, but in most other cases (e.g. when
> communicating with processes or network sockets) I think that it's more
> appropriate to refuse to accept, produce, or propagate invalid UTF-8
> such as overlong encodings.

Emacs indeed rejects them, but that doesn't mean it disallows raw
bytes as part of otherwise valid UTF-8 content.  It's a fact of life
that such stray bytes sometimes happen, and users would be generally
unhappy if Emacs would reject a file because it had such bytes.