From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Mon, 06 Oct 2014 12:27:35 -0400 Message-ID: <871tql17uw.fsf@yeeloong.lan> References: <54193A70.9020901@member.fsf.org> <87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru> <83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> <837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp> <87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org> <87lhp5m99w.fsf@fencepost.gnu.org> <87h9ztm5oa.fsf@fencepost.gnu.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> <87d2a54t1m.fsf@yeeloong.lan> <83lhotme1e.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1412612916 17124 80.91.229.3 (6 Oct 2014 16:28:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 6 Oct 2014 16:28:36 +0000 (UTC) Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 18:28:28 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XbB9A-0000AJ-6I for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 18:28:28 +0200 Original-Received: from localhost ([::1]:52972 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbB99-0006M1-Qi for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 12:28:27 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbB92-0006LU-Ld for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:28:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XbB8x-0003dk-Kp for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:28:20 -0400 Original-Received: from world.peace.net ([96.39.62.75]:56419) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbB8p-0003KT-V4; Mon, 06 Oct 2014 12:28:08 -0400 Original-Received: from c-24-62-95-23.hsd1.ma.comcast.net ([24.62.95.23] helo=yeeloong.lan) by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1XbB8g-0005Ju-94; Mon, 06 Oct 2014 12:27:58 -0400 In-Reply-To: <83lhotme1e.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Oct 2014 18:08:29 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175021 Archived-At: Eli Zaretskii writes: >> From: Mark H Weaver >> Cc: monnier@iro.umontreal.ca, dak@gnu.org, dmantipov@yandex.ru, >> emacs-devel@gnu.org, handa@gnu.org, eliz@gnu.org, stephen@xemacs.org >> Date: Mon, 06 Oct 2014 02:21:41 -0400 >> >> A related problem has to do with the fact that naively implemented UTF-8 >> allows code points to be represented with more bytes than are actually >> needed, essentially by padding the code point with leading zeroes and >> then encoding with UTF-8 as if the high bits were non-zero. For >> example, the ASCII quote (") can be represented as the single byte 0x22, >> the two byte sequence 0xC0 0xA2, etc. >> >> UTF-8 decoders are supposed to detect and reject these "overlong" >> encodings, but it is likely that many programs fail to do this. Such >> programs are usually vulnerable to these overlong encodings when trying >> to detect special characters (e.g. for quoting/escaping) or when >> validating inputs. >> >> To cope with this, the Unicode standards require that UTF-8 codecs >> reject overlong encodings and other invalid byte sequences. This is in >> direct conflict with the idea of "raw byte" code points, whose purpose >> is to be tolerant of arbitrary byte sequences and to propagate them >> unchanged. > > The obvious solution is to encode the raw bytes internally in a UTF-8 > compatible way. Which is what Emacs does in its buffers and strings, > as I'm sure you know. Can't Guile do something similar? I'm afraid you've misunderstood, or perhaps I've failed to explain it clearly. It doesn't matter how these raw bytes are encoded internally. No matter what mechanism we use to accomplish it, propagating invalid byte sequences by default is bad security policy. It has the effect of exposing all internal subsystems to malformed UTF-8 such as overlong encodings unless users take explicit steps to check for them and remove them. This is a recipe for security holes. The Unicode standard requires that all UTF-8 codecs refuse to accept, produce, or propagate invalid byte sequences, including the troublesome overlong encodings. I'm not one for blindly following standards, but in my opinion this is the default policy we should adopt. Editing files is an unusual case. Of course, we want users to be able to edit a file with coding errors, and to leave any part of the file untouched by the user exactly as it was. Anything else would be a mistake. However, I would argue that even in Emacs, string<->bytevector conversions should be strict by default, so that other uses of them (e.g. communication over sockets, pipes, and encoding of command-line arguments to subprocess) should be strict by default. Even if you disagree, I'd like the strict mode to remain the default in Guile. Mark