From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Mon, 06 Oct 2014 17:33:21 +0200 Message-ID: <87fvf1dxha.fsf@fencepost.gnu.org> References: <54193A70.9020901@member.fsf.org> <87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru> <83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> <837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp> <87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org> <87lhp5m99w.fsf@fencepost.gnu.org> <87h9ztm5oa.fsf@fencepost.gnu.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> <87d2a54t1m.fsf@yeeloong.lan> <83lhotme1e.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1412609672 5241 80.91.229.3 (6 Oct 2014 15:34:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 6 Oct 2014 15:34:32 +0000 (UTC) Cc: rms@gnu.org, Mark H Weaver , dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 17:34:25 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XbAIq-0002EV-Bp for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 17:34:24 +0200 Original-Received: from localhost ([::1]:52556 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbAIp-00068W-MX for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 11:34:23 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36996) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbAIf-00067o-T8 for emacs-devel@gnu.org; Mon, 06 Oct 2014 11:34:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XbAIY-0002NM-8M for emacs-devel@gnu.org; Mon, 06 Oct 2014 11:34:13 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44985) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbAIY-0002NG-4W for emacs-devel@gnu.org; Mon, 06 Oct 2014 11:34:06 -0400 Original-Received: from localhost ([127.0.0.1]:52158 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbAIQ-0002bM-By; Mon, 06 Oct 2014 11:33:58 -0400 Original-Received: by lola (Postfix, from userid 1000) id 4F63CE0489; Mon, 6 Oct 2014 17:33:21 +0200 (CEST) In-Reply-To: <83lhotme1e.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 06 Oct 2014 18:08:29 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175019 Archived-At: Eli Zaretskii writes: >> From: Mark H Weaver >> Cc: monnier@iro.umontreal.ca, dak@gnu.org, dmantipov@yandex.ru, >> emacs-devel@gnu.org, handa@gnu.org, eliz@gnu.org, stephen@xemacs.org >> Date: Mon, 06 Oct 2014 02:21:41 -0400 >> >> A related problem has to do with the fact that naively implemented UTF-8 >> allows code points to be represented with more bytes than are actually >> needed, essentially by padding the code point with leading zeroes and >> then encoding with UTF-8 as if the high bits were non-zero. For >> example, the ASCII quote (") can be represented as the single byte 0x22, >> the two byte sequence 0xC0 0xA2, etc. >> >> UTF-8 decoders are supposed to detect and reject these "overlong" >> encodings, but it is likely that many programs fail to do this. Such >> programs are usually vulnerable to these overlong encodings when trying >> to detect special characters (e.g. for quoting/escaping) or when >> validating inputs. >> >> To cope with this, the Unicode standards require that UTF-8 codecs >> reject overlong encodings and other invalid byte sequences. This is in >> direct conflict with the idea of "raw byte" code points, whose purpose >> is to be tolerant of arbitrary byte sequences and to propagate them >> unchanged. > > The obvious solution is to encode the raw bytes internally in a UTF-8 > compatible way. Which is what Emacs does in its buffers and strings, > as I'm sure you know. Can't Guile do something similar? If an overlong UTF-8 byte sequence representing '"' is processed transparently by Emacs, it will be reencoded into the original afterwards and depending on the next processing stage might trip up software afterwards. Of course, it would have done equally so without Emacs (or GUILE) in the middle. The solution obviously is to use a coding scheme for recoding that does _not_ reproduce unencodable bytes. Now if the intermediate processing added escape characters for the unencodable bytes, you can arrive at something like (using % for unencodable) [Input] Robert%");DROP TABLE Students;-- [quotified] "Robert\%\");DROP TABLE Students;--" [cleanencoded] "Robert\\");DROP TABLE Students;--" [Pasted into SQL command] Uh oh. -- David Kastrup