From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Inadequate documentation of silly characters on screen. Date: Sat, 21 Nov 2009 15:36:49 +0100 Organization: Organization?!? Message-ID: <877htkaqri.fsf@lola.goethe.zz> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <87aayitvoy.fsf@wanchan.jasonrumney.net> <87ocmyf6so.fsf@catnip.gol.com> <87vdh57tp2.fsf@uwakimon.sk.tsukuba.ac.jp> <878we1ekb0.fsf@uwakimon.sk.tsukuba.ac.jp> <87hbso347j.fsf@uwakimon.sk.tsukuba.ac.jp> <877htk2xbk.fsf@uwakimon.sk.tsukuba.ac.jp> <87lji0awh8.fsf@lola.goethe.zz> <876394dlsr.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1258814672 12796 80.91.229.12 (21 Nov 2009 14:44:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2009 14:44:32 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 15:44:25 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBrBI-0005vl-BY for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 15:43:20 +0100 Original-Received: from localhost ([127.0.0.1]:55273 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBrBH-0001Pt-DT for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2009 09:43:19 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBr8p-00018T-HR for emacs-devel@gnu.org; Sat, 21 Nov 2009 09:40:47 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBr8l-00017w-Uy for emacs-devel@gnu.org; Sat, 21 Nov 2009 09:40:47 -0500 Original-Received: from [199.232.76.173] (port=38467 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBr8l-00017p-J8 for emacs-devel@gnu.org; Sat, 21 Nov 2009 09:40:43 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]:34113) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NBr8k-0008Iu-VR for emacs-devel@gnu.org; Sat, 21 Nov 2009 09:40:43 -0500 Original-Received: from list by lo.gmane.org with local (Exim 4.50) id 1NBr6b-00019k-PQ for emacs-devel@gnu.org; Sat, 21 Nov 2009 15:38:29 +0100 Original-Received: from p5b2c26da.dip.t-dialin.net ([91.44.38.218]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 21 Nov 2009 15:38:29 +0100 Original-Received: from dak by p5b2c26da.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 21 Nov 2009 15:38:29 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 88 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: p5b2c26da.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) Cancel-Lock: sha1:9g29qer7sHRmR54zXF3SJaduco8= X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117446 Archived-At: "Stephen J. Turnbull" writes: > David Kastrup writes: > > > > However, I think a well-behaved platform should by default error > > > (something derived from invalid-state, in XEmacs's error > > > hierarchy) in such a case; normally this means corruption in the > > > file. > > > > We take care that it does not mean corruption. > > I meant pre-existing corruption [...] That interpretation is not the business of the editor. It may decide to give a warning, but refusing to work at all does not increase its usefulness. > > And more often it means that you might have been loading with the > > wrong encoding (people do that all the time). If you edit some > > innocent ASCII part > > You can't do that if the file is not in a buffer because the encoding > error aborted the conversion. Not being able to do what I want is not a particularly enticing feature. > Aborting the conversion is what the Unicode Consortium requires, too, > IIRC: An editor is not the same as a validator. It's not its business to decide what files I should be allowed to work with. > errors in UTF-8 (or any other UTF for that matter) are considered > *fatal* by the standard. Exactly what that means is up to the > application to decide. One plausible approach would be to do what you > do now, but make the buffer read-only. Making the buffer read-only is a reasonable thing to do if it can't possibly be written back unchanged. For example, if I load a file in latin-1 and insert a few non-latin-1 characters. In this case Emacs should not just silently write the file in utf-8 because that changes the encoding of some preexisting characters. The situation is different if I load a pure ASCII file: in that case, the utf-8 decision is feasible when compatible with the environment. > > Sometimes there is no "right encoding". > > So what? The point is that there certainly are *wrong* encodings, > namely ones that will result in corruption if you try to save the file > in that encoding. But we have a fair amount of encodings (those without escape characters IIRC) which don't imply corruption when saving. And that is a good feature for an editor. For example, when working with version control systems, you want minimal diffs. Encoding systems with escape characters are not good for that. I would strongly advise against Emacs picking any escape-character based encoding (or otherwise non-byte-stream-preserving) automatically. Less breakage is always a good thing. > But when faced with ambiguity, it is best to refuse to guess. You don't need to guess if you just preserve the byte sequence. That makes it somebody else's problem. The GNU utilities have always made it a point to work with arbitrary input without insisting on it being "sensible". Historically, most Unix utilities just crashed when you fed them arbitrary garbage. They have taken a lesson from GNU nowadays. And I consider it a good lesson. > > We currently _have_ [a scheme for encoding invalid sequences of > > code units] in place. We just use different Unicode-invalid code > > points [from Python]. > > Conceded. I realized that later; the important difference is that > Python only uses that scheme when explicitly requested. All in all, it is nobody else's business what encoding Emacs uses for internal purposes. Making Emacs preserve byte streams means that the user has to worry less, not more, about what Emacs might be able to work with. The Emacs 23 internal encoding does a better job not getting into the hair of users with encoding issues than Emacs 22 did, because of a better correspondence with external encodings. But ideally, the user should not have to worry about the difference. -- David Kastrup