From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Inadequate documentation of silly characters on screen. Date: Sat, 21 Nov 2009 22:55:48 +0900 Message-ID: <876394dlsr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <87aayitvoy.fsf@wanchan.jasonrumney.net> <87ocmyf6so.fsf@catnip.gol.com> <87vdh57tp2.fsf@uwakimon.sk.tsukuba.ac.jp> <878we1ekb0.fsf@uwakimon.sk.tsukuba.ac.jp> <87hbso347j.fsf@uwakimon.sk.tsukuba.ac.jp> <877htk2xbk.fsf@uwakimon.sk.tsukuba.ac.jp> <87lji0awh8.fsf@lola.goethe.zz> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1258811394 4299 80.91.229.12 (21 Nov 2009 13:49:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2009 13:49:54 +0000 (UTC) Cc: emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 14:49:47 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBqLS-00060h-L5 for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 14:49:46 +0100 Original-Received: from localhost ([127.0.0.1]:41306 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBqLS-0000oA-Du for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 08:49:46 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBqLL-0000nv-EC for emacs-devel@gnu.org; Sat, 21 Nov 2009 08:49:39 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBqLG-0000mQ-D4 for emacs-devel@gnu.org; Sat, 21 Nov 2009 08:49:38 -0500 Original-Received: from [199.232.76.173] (port=60403 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBqLG-0000mJ-7k for emacs-devel@gnu.org; Sat, 21 Nov 2009 08:49:34 -0500 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:50899) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NBqL8-0007LT-IH; Sat, 21 Nov 2009 08:49:27 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id 2F0DA1537B4; Sat, 21 Nov 2009 22:49:22 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 98D8911F957; Sat, 21 Nov 2009 22:55:48 +0900 (JST) In-Reply-To: <87lji0awh8.fsf@lola.goethe.zz> X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" d20e0a45a4b2 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117443 Archived-At: David Kastrup writes: > > However, I think a well-behaved platform should by default error > > (something derived from invalid-state, in XEmacs's error hierarchy) in > > such a case; normally this means corruption in the file. > > We take care that it does not mean corruption. I meant pre-existing corruption, like your pre-existing disposition to bash XEmacs. Please take it elsewhere; it doesn't belong on Emacs channels. (Of course I'd prefer not to see it on XEmacs channels either, but at least it wouldn't be entirely off-topic there.) > And more often it means that you might have been loading with the > wrong encoding (people do that all the time). If you edit some > innocent ASCII part You can't do that if the file is not in a buffer because the encoding error aborted the conversion. Aborting the conversion is what the Unicode Consortium requires, too, IIRC: errors in UTF-8 (or any other UTF for that matter) are considered *fatal* by the standard. Exactly what that means is up to the application to decide. One plausible approach would be to do what you do now, but make the buffer read-only. > Sometimes there is no "right encoding". So what? The point is that there certainly are *wrong* encodings, namely ones that will result in corruption if you try to save the file in that encoding. There are usually many "usable" encodings (binary is always available, for example). Some will be preferred by users, and that will be reflected in coding system precedence. But when faced with ambiguity, it is best to refuse to guess. > We currently _have_ [a scheme for encoding invalid sequences of > code units] in place. We just use different Unicode-invalid code > points [from Python]. Conceded. I realized that later; the important difference is that Python only uses that scheme when explicitly requested.