From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; [nxml] BOM and utf-8 Date: Tue, 20 May 2008 09:13:10 +0200 Message-ID: <85r6bx78jd.fsf@lola.goethe.zz> References: <87od75kt78.fsf@pdrechsler.de> <87mymofip6.fsf@uwakimon.sk.tsukuba.ac.jp> <878wy8ny36.fsf@catnip.gol.com> <87k5hsfdvd.fsf@uwakimon.sk.tsukuba.ac.jp> <85y768ug6x.fsf@lola.goethe.zz> <87fxsff0xc.fsf@uwakimon.sk.tsukuba.ac.jp> <854p8vrxk5.fsf@lola.goethe.zz> <874p8uf2xm.fsf@uwakimon.sk.tsukuba.ac.jp> <85ej7yqafj.fsf@lola.goethe.zz> <87wslpeuj8.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1211269504 11480 80.91.229.12 (20 May 2008 07:45:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 20 May 2008 07:45:04 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Patrick Drechsler , Miles Bader To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue May 20 09:45:40 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JyMXP-0008PX-11 for ged-emacs-devel@m.gmane.org; Tue, 20 May 2008 09:45:35 +0200 Original-Received: from localhost ([127.0.0.1]:55526 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JyMWe-00072F-TI for ged-emacs-devel@m.gmane.org; Tue, 20 May 2008 03:44:48 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JyMVd-0006ak-Sv for emacs-devel@gnu.org; Tue, 20 May 2008 03:43:46 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JyMVW-0006Rx-7w for emacs-devel@gnu.org; Tue, 20 May 2008 03:43:39 -0400 Original-Received: from [199.232.76.173] (port=48555 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JyMVS-0006RH-Hd for emacs-devel@gnu.org; Tue, 20 May 2008 03:43:35 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:56695) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JyMVS-0006rT-9H for emacs-devel@gnu.org; Tue, 20 May 2008 03:43:34 -0400 Original-Received: from mx10.gnu.org ([199.232.76.166]:40123) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JyM0z-0007xS-Rs for emacs-pretest-bug@gnu.org; Tue, 20 May 2008 03:12:05 -0400 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JyM28-0002HK-Dh for emacs-pretest-bug@gnu.org; Tue, 20 May 2008 03:13:21 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]:44468) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JyM27-0002H0-BF; Tue, 20 May 2008 03:13:15 -0400 Original-Received: from mail-in-12.arcor-online.net ([151.189.21.52]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JyM26-0001Vy-IF; Tue, 20 May 2008 03:13:14 -0400 Original-Received: from mail-in-03-z2.arcor-online.net (mail-in-03-z2.arcor-online.net [151.189.8.15]) by mail-in-12.arcor-online.net (Postfix) with ESMTP id DA8904C26E; Tue, 20 May 2008 09:13:12 +0200 (CEST) Original-Received: from mail-in-04.arcor-online.net (mail-in-04.arcor-online.net [151.189.21.44]) by mail-in-03-z2.arcor-online.net (Postfix) with ESMTP id BBA772D3B6A; Tue, 20 May 2008 09:13:12 +0200 (CEST) Original-Received: from lola.goethe.zz (dslb-084-061-005-092.pools.arcor-ip.net [84.61.5.92]) by mail-in-04.arcor-online.net (Postfix) with ESMTP id 812721BF575; Tue, 20 May 2008 09:13:12 +0200 (CEST) Original-Received: by lola.goethe.zz (Postfix, from userid 1002) id 3A0371C464F9; Tue, 20 May 2008 09:13:10 +0200 (CEST) In-Reply-To: <87wslpeuj8.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Tue, 20 May 2008 08:36:11 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-Virus-Scanned: ClamAV 0.92.1/7178/Tue May 20 04:52:55 2008 on mail-in-04.arcor-online.net X-Virus-Status: Clean X-detected-kernel: by mx20.gnu.org: Linux 2.4-2.6 X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97437 gmane.emacs.pretest.bugs:22396 Archived-At: "Stephen J. Turnbull" writes: > David Kastrup writes: > > > I am not interested in the "goal of Unicode" but in that of Emacs. > > Unicode is about text files. But Emacs communicates via byte streams > > and those are not necessarily text, or necessarily all text. > > Some Emacs files *are* text, and getting them to behave correctly will > require understanding "the goals of Unicode". Since Unicode is now > the underlying representation of multibyte buffers, you don't have a > choice about this. Cf. Thomas Morgan's recent post on "disappearing > cursor". Sigh. Bugs are there to be fixed, not to be used as an excuse for more bugs. The interpretation of Unicode is a matter of the display engine, not of the byte stream encoders/decoders. > > > Sure, and Emacs must provide coding systems that preserve them, > > > and generally use those coding systems by default. Did anybody > > > say otherwise? > > > > So what was your point supposed to be? > > That Miles could use a BOM-swallowing encoding on input and a non-BOM- > producing encoding on output to enforce his view of Microsoft > conventions on others. I suppose you underestimate Miles here. > > So forward-char and replace-string should be made to work as > > expected on non-normalized texts. > > Good luck. I don't know how to do that, and doubt that it is > possible. We have similar issues with case-folding replacements. Anyway: one problem is not an excuse to introduce unrelated bugs elsewhere. Moving character unification to a place where it does more damage does not magically make the problem different. > I do not think that "as expected" can be well defined, because for > purposes like computing storage requirements composing characters > should be considered characters, while for others like computing the > number of columns occupied by a line they should not. Again, you are being destructive. Problems don't present an excuse for being sloppy. If one can see a problem that can't be fixed by principle, then one should try confining it to those operations where it is inherent instead of spreading its effects all around and making everything unpredictable. Yes, there are questions in the presence of composing characters of what one wants to have forward-char and replace-string and overwrite-mode do. One reasonable approach is to consider Unicode glyphs as an inseparable entity with regard to user commands. It is basically Emacs 20.2 all over. But composed Unicode glyphs have no single code points. They are vectors. As long as a character representation as scalar integers remains valid, Unicode code points is all that we can do. > > > Binary faithfulness may be incompatible with other user demands, > > > for example if a user introduces Latin-2 characters into a > > > Latin-9 text. > > > > Why do you think we switched to utf-8 internally and got rid of > > latin unification? > > David, don't you realize that is not a response to what I wrote? > > I think it's time to stop this thread until you address the issues > instead of me. Whatever. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum