From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Mon, 06 Oct 2014 02:21:41 -0400 Message-ID: <87d2a54t1m.fsf@yeeloong.lan> References: <54193A70.9020901@member.fsf.org> <87lhp6h4zb.fsf@panthera.terpri.org> <87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru> <83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> <837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp> <87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org> <87lhp5m99w.fsf@fencepost.gnu.org> <87h9ztm5oa.fsf@fencepost.gnu.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1412576583 10820 80.91.229.3 (6 Oct 2014 06:23:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 6 Oct 2014 06:23:03 +0000 (UTC) Cc: dak@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, eliz@gnu.org, stephen@xemacs.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 08:22:56 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xb1h7-0002SO-OP for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 08:22:53 +0200 Original-Received: from localhost ([::1]:50111 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xb1h7-0000OC-BQ for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 02:22:53 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43312) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xb1gl-0000HH-D5 for emacs-devel@gnu.org; Mon, 06 Oct 2014 02:22:37 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xb1gf-0004rj-HF for emacs-devel@gnu.org; Mon, 06 Oct 2014 02:22:31 -0400 Original-Received: from world.peace.net ([96.39.62.75]:55887) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xb1gR-0004i2-0K; Mon, 06 Oct 2014 02:22:11 -0400 Original-Received: from c-24-62-95-23.hsd1.ma.comcast.net ([24.62.95.23] helo=yeeloong.lan) by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1Xb1gJ-000211-DH; Mon, 06 Oct 2014 02:22:03 -0400 In-Reply-To: (Richard Stallman's message of "Sun, 05 Oct 2014 17:49:46 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175005 Archived-At: Richard Stallman writes: > * I'm concerned that there are security implications to supporting the > "raw byte" code points. I can expand on this more if you'd like. > > I'd like to know how it is that "raw bytes" have security implications. To give an example, consider a procedure that needs to pass a string from an untrusted source to an SQL query. To do this safely, it needs to quote the string. I haven't researched how to properly quote SQL string literals, but in general, quoting is typically done by recognizing some set of special characters that must be escaped, and allowing all other characters through unmodified. However, "raw byte" code points can be used to bypass such a quoting mechanism, and thus send an unescaped closing quote to the SQL database followed by arbitrary SQL commands. A related problem has to do with the fact that naively implemented UTF-8 allows code points to be represented with more bytes than are actually needed, essentially by padding the code point with leading zeroes and then encoding with UTF-8 as if the high bits were non-zero. For example, the ASCII quote (") can be represented as the single byte 0x22, the two byte sequence 0xC0 0xA2, etc. UTF-8 decoders are supposed to detect and reject these "overlong" encodings, but it is likely that many programs fail to do this. Such programs are usually vulnerable to these overlong encodings when trying to detect special characters (e.g. for quoting/escaping) or when validating inputs. To cope with this, the Unicode standards require that UTF-8 codecs reject overlong encodings and other invalid byte sequences. This is in direct conflict with the idea of "raw byte" code points, whose purpose is to be tolerant of arbitrary byte sequences and to propagate them unchanged. FWIW, I agree that the Emacs behavior is desirable when editing a file that may contain coding errors, but in most other cases (e.g. when communicating with processes or network sockets) I think that it's more appropriate to refuse to accept, produce, or propagate invalid UTF-8 such as overlong encodings. Mark