From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Sun, 12 Oct 2014 10:35:36 +0900 Message-ID: <874mvaoys7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <54193A70.9020901@member.fsf.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> <87d2a54t1m.fsf@yeeloong.lan> <83lhotme1e.fsf@gnu.org> <871tql17uw.fsf@yeeloong.lan> <838uktm9gw.fsf@gnu.org> <87h9zgarvp.fsf@fencepost.gnu.org> <83y4srjaot.fsf@gnu.org> <83r3yhiu8c.fsf@gnu.org> <83siiw9c6t.fsf@gnu.org> <83zjd3846e.fsf@gnu.org> <8738auyxke.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1413077802 18294 80.91.229.3 (12 Oct 2014 01:36:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 12 Oct 2014 01:36:42 +0000 (UTC) Cc: dak@gnu.org, rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, Eli Zaretskii To: Mark H Weaver Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Oct 12 03:36:34 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xd85K-0008Hu-4K for ged-emacs-devel@m.gmane.org; Sun, 12 Oct 2014 03:36:34 +0200 Original-Received: from localhost ([::1]:55741 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xd85J-0007Xl-G2 for ged-emacs-devel@m.gmane.org; Sat, 11 Oct 2014 21:36:33 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57838) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xd85B-0007XO-59 for emacs-devel@gnu.org; Sat, 11 Oct 2014 21:36:31 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xd855-0005Sj-4c for emacs-devel@gnu.org; Sat, 11 Oct 2014 21:36:25 -0400 Original-Received: from shako.sk.tsukuba.ac.jp ([130.158.97.161]:44656) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xd84s-0005NL-FW; Sat, 11 Oct 2014 21:36:06 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by shako.sk.tsukuba.ac.jp (Postfix) with ESMTP id 891751C3A45; Sun, 12 Oct 2014 10:35:36 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 7AEFC1A2888; Sun, 12 Oct 2014 10:35:36 +0900 (JST) In-Reply-To: <8738auyxke.fsf@netris.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" acf1c26e3019 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 130.158.97.161 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175267 Archived-At: Mark H Weaver writes: > Eli Zaretskii writes: > > Specify, and then drag it all the way down the encoding/decoding > > machinery. > > The strictness flag should conceptually be part of the encoding, and > thus associated with the I/O port. This is the way Emacs works already. However, I think the Python system, where strictness is part of the I/O port, not the encoding, and the encodings are designed to error and then hand the invalid raw bytes to the error handler if desired, is a better API. I don't know how easy it would be to provide this in Emacs (XEmacs streams are quite different from Emacs'), but it's probably not too hard since the rawbytes facility is already present. It would be nice to extend that to EOL handling as well IMO, but that's not as big an issue. > This would obviate the need to propagate it down through layers of > code. It's not so easy, because the layers of code referred to are not the encoding/decoding machinery in the sense of the coding system (ISTR you use "codec", Emacs calls them "coding systems" to be more like ISO 2022 "coding extensions" IIRC). It's the mechanism for determining exactly which coding system is to be used, and the difficulties are really in the area of UI more so than in API. In Emacs Lisp there's a tradition of embedding parameters which are normally specified as constants in the name. (This issue has already been referred to in different terms.) So instead of ;; these IO functions are all imaginary (let ((s (open-file "foo"))) (set-stream-coding-system s 'utf-8) (set-stream-eol s 'unix) ; EOL is LF (set-stream-invalid-coding-handler s 'strict) ;; now we can do I/O, signaling errors on invalid coding (read-stream-into-buffer s)) ;; and now we're ready to edit, assuming valid coding! Emacs does (find-file "foo" 'utf-8-unix-strict) ; or is it utf-8-strict-unix? arghh! Things are further complicated by the fact that Emacs has an extremely complex system for specifying the encoding and the newline convention used, and either or both might be automatically detected. All of the parameters can be tweaked at any stage in the specification routines, and there are about 5 levels of configurability for files (configuration is done by setting or binding dynamic variables) and more than one for network and process streams (which are different). Adding specification of the error handling convention will make the *user interface* yet more complicated -- and it has to be possible for all this to be done separately for every stream (you might trust files on your host but not the network). And then there's the "auto" coding system, which guesses the appropriate coding system by analyzing the input. I have always thought that the Emacs' developers emphasis on having Emacs "DWIM" so much in this area is somewhat misplaced[1], but that is the way things are and have been since the late 1980s (Emacs actually installed these features in 1998 or so, but there were patches that were universally used for Asian languages from the late 1980s), and there will be a lot of resistence from users and developers to any changes that require them to do things differently. Footnotes: [1] Historically, these features were developed by Japanese developers, who have to deal with an insane environment where even today you will encounter at least 5 major encodings on a daily basis (cheating a little, since UTF-16 is usually visible only inside MSFT file formats and in Java programming), and most of those have innumerable private variants (most large corporations in Japan have private sets of Chinese characters that are in Unicode but were historically not in the Japanese national standards). It's easy to see why Japanese would want a good guessing facility! Most of the rest of us either don't have to deal with it (95% of what we see is in one particular encoding), or have an extremely difficult problem in distinguishing the ones common in our environment (is this Latin-1 or Latin-9? vs. the Japanese case where the bit patterns of the major encodings are very distinctive). This is not to say that guessing is a bad idea where it can be done accurately, just that the Emacs facilities are way too complex for the benefit they provide over a much simpler system.