From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; [nxml] BOM and utf-8 Date: Tue, 20 May 2008 05:34:45 +0900 Message-ID: <874p8uf2xm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87od75kt78.fsf@pdrechsler.de> <87mymofip6.fsf@uwakimon.sk.tsukuba.ac.jp> <878wy8ny36.fsf@catnip.gol.com> <87k5hsfdvd.fsf@uwakimon.sk.tsukuba.ac.jp> <85y768ug6x.fsf@lola.goethe.zz> <87fxsff0xc.fsf@uwakimon.sk.tsukuba.ac.jp> <854p8vrxk5.fsf@lola.goethe.zz> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1211228626 388 80.91.229.12 (19 May 2008 20:23:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 19 May 2008 20:23:46 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Patrick Drechsler , Miles Bader To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 19 22:24:23 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JyBu2-0000EK-Ep for ged-emacs-devel@m.gmane.org; Mon, 19 May 2008 22:24:14 +0200 Original-Received: from localhost ([127.0.0.1]:33679 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JyBtI-0006TZ-9l for ged-emacs-devel@m.gmane.org; Mon, 19 May 2008 16:23:28 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JyBtD-0006TK-Hl for emacs-devel@gnu.org; Mon, 19 May 2008 16:23:23 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JyBtB-0006T8-Kd for emacs-devel@gnu.org; Mon, 19 May 2008 16:23:22 -0400 Original-Received: from [199.232.76.173] (port=46737 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JyBtB-0006T5-Hu for emacs-devel@gnu.org; Mon, 19 May 2008 16:23:21 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:52662) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JyBtB-0004wZ-9M for emacs-devel@gnu.org; Mon, 19 May 2008 16:23:21 -0400 Original-Received: from mx10.gnu.org ([199.232.76.166]:59079) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JyBry-0006O8-Et for emacs-pretest-bug@gnu.org; Mon, 19 May 2008 16:22:06 -0400 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JyBt7-0004vW-5O for emacs-pretest-bug@gnu.org; Mon, 19 May 2008 16:23:20 -0400 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:60780) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JyBt1-0004tk-SD; Mon, 19 May 2008 16:23:12 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id 819241535AC; Tue, 20 May 2008 05:23:07 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 8D4821A25C3; Tue, 20 May 2008 05:34:45 +0900 (JST) In-Reply-To: <854p8vrxk5.fsf@lola.goethe.zz> X-Mailer: VM ?bug? under XEmacs 21.5.21 (x86_64-unknown-linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97409 gmane.emacs.pretest.bugs:22390 Archived-At: David Kastrup writes: > "Stephen J. Turnbull" writes: > > In any case, maintaining faithfulness of representation is simply not > > possible, as you point out > > With some coding systems. But the latin-* and utf-* can maintain the > binary stream since their coding is required to be canonical in the > standard. latin-* will do so because of their extremely limited range. It's unfortunate that programmer intuitions about text have been Americanized (== drastically limited) by these encodings. utf-* can maintain representation in the very limited sense you have in mind, and I know that is very useful to you in dealing with non- conforming applications like TeX. However, you still run into the problem that faithfulness of representation is not a goal of Unicode. > > It's also not at all obvious that that is a very > > useful requirement when dealing with a character-oriented standard > > like Unicode or XML, since you can expect many applications to > > canonicalize the text "behind your back". > > That's not an issue. What do you mean by "that's not an issue?" How can you know when I haven't named the application? > Also you can load, edit and save a text file in colloborative > environments, and the diffs/patches will be just in the edited areas > (this will supposedly work better with Emacs-23 than Emacs-22). Those > are quite important features. Sure, and Emacs must provide coding systems that preserve them, and generally use those coding systems by default. Did anybody say otherwise? > > Users should get used to it, and we should document how to force Emacs > > to error rather than do anything behind your back for those who need > > binary faithfulness rather than text faithfulness. > > Since binary faithfulness implies text faithfulness, there is no reason > not to the right thing instead of erroring out. "There is no reason"? How arrogant of you! Rather, "David Kastrup lacks the knowledge of the reasons." Here are three examples: Binary faithfulness may imply breaking text programs. For example, `forward-char' and `replace-string' will give surprising results in a buffer using Unicode internally that contains Unicode in NFD normalization (and these anomolies will be noticeable in all Western European languages excluding English). Binary faithfulness may imply inefficiency. For example, files need not be normalized, which would imply keeping a copy of the whole file and doing a Unicode diff to determine which parts of the file need to be saved from the buffer and which parts from the saved copy. Binary faithfulness may be incompatible with other user demands, for example if a user introduces Latin-2 characters into a Latin-9 text.