From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Mon, 06 Oct 2014 19:47:11 +0300 Message-ID: <838uktm9gw.fsf@gnu.org> References: <54193A70.9020901@member.fsf.org> <87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru> <83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> <837g0ptnlj.fsf@gnu.org> <87r3yxwdr6.fsf@uwakimon.sk.tsukuba.ac.jp> <87tx3tmi3t.fsf@fencepost.gnu.org> <834mvttgsf.fsf@gnu.org> <87lhp5m99w.fsf@fencepost.gnu.org> <87h9ztm5oa.fsf@fencepost.gnu.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> <87d2a54t1m.fsf@yeeloong.lan> <83lhotme1e.fsf@gnu.org> <871tql17uw.fsf@yeeloong.lan> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1412614093 32287 80.91.229.3 (6 Oct 2014 16:48:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 6 Oct 2014 16:48:13 +0000 (UTC) Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org To: Mark H Weaver Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 06 18:48:05 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XbBS3-00038D-2T for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 18:47:59 +0200 Original-Received: from localhost ([::1]:53055 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbBS2-00077N-P1 for ged-emacs-devel@m.gmane.org; Mon, 06 Oct 2014 12:47:58 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53072) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbBRL-0006Hz-4F for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:47:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XbBRG-0003Kh-NX for emacs-devel@gnu.org; Mon, 06 Oct 2014 12:47:15 -0400 Original-Received: from mtaout27.012.net.il ([80.179.55.183]:33415) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbBRB-0003Go-Nb; Mon, 06 Oct 2014 12:47:06 -0400 Original-Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0ND100O007M2IN00@mtaout27.012.net.il>; Mon, 06 Oct 2014 19:41:43 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0ND100K3Z7PJJY40@mtaout27.012.net.il>; Mon, 06 Oct 2014 19:41:43 +0300 (IDT) In-reply-to: <871tql17uw.fsf@yeeloong.lan> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.183 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175024 Archived-At: > From: Mark H Weaver > Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, stephen@xemacs.org > Date: Mon, 06 Oct 2014 12:27:35 -0400 > > > The obvious solution is to encode the raw bytes internally in a UTF-8 > > compatible way. Which is what Emacs does in its buffers and strings, > > as I'm sure you know. Can't Guile do something similar? > > I'm afraid you've misunderstood, or perhaps I've failed to explain it > clearly. I think I did understand your perfectly clear explanation. > It doesn't matter how these raw bytes are encoded internally. No matter > what mechanism we use to accomplish it, propagating invalid byte > sequences by default is bad security policy. How can we be responsible for byte streams that originated outside? That's the responsibility of the source. And if there is a consumer, then it is their responsibility not to trip upon such bytes. But how can you refuse to copy such bytes when you are just a pipe that is expected not to change anything it wasn't toild to? Btw, Emacs doesn't expose the internal representation of these bytes easily to Lisp programs. That is, whenever any program tries to access the character at that position, it gets the original raw byte that was there before the string was read from outside. A Lisp program needs some very tricky and deliberate techniques to access the internal representation of such bytes. (It isn't "overlong", btw, we just represent the 128 bytes as codepoints in the 0x3fffXX range, and encode it in UTF-8 with 5 bytes.) > The Unicode standard requires that all UTF-8 codecs refuse to accept, > produce, or propagate invalid byte sequences, including the troublesome > overlong encodings. What Emacs does is interpret each byte of such invalid byte sequences as a separate raw byte, and represent each one of them internally as described above. Emacs cannot "refuse to propagate" the original sequence, because users of an editor expect it not to alter any part of the input that wasn't explicitly modified by the user or commands she invoked. > I'm not one for blindly following standards, but in my opinion this > is the default policy we should adopt. So just passing a string unaltered through a Guile program would change that string? That sounds like unpleasant surprise for the users, at least for Emacs users. Emacs has been there around v20.x, and we still carry the scars. It would be a unwise, IMO, if Guile would repeat those same mistakes.