unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: David Kastrup <dak@gnu.org>
Cc: emacs-pretest-bug@gnu.org,
	Patrick Drechsler <patrick@pdrechsler.de>,
	Miles Bader <miles@gnu.org>
Subject: Re: 23.0.60; [nxml] BOM and utf-8
Date: Tue, 20 May 2008 05:34:45 +0900	[thread overview]
Message-ID: <874p8uf2xm.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <854p8vrxk5.fsf@lola.goethe.zz>

David Kastrup writes:
 > "Stephen J. Turnbull" <stephen@xemacs.org> writes:

 > > In any case, maintaining faithfulness of representation is simply not
 > > possible, as you point out
 > 
 > With some coding systems.  But the latin-* and utf-* can maintain the
 > binary stream since their coding is required to be canonical in the
 > standard.

latin-* will do so because of their extremely limited range.  It's
unfortunate that programmer intuitions about text have been
Americanized (== drastically limited) by these encodings.

utf-* can maintain representation in the very limited sense you have
in mind, and I know that is very useful to you in dealing with non-
conforming applications like TeX.  However, you still run into the
problem that faithfulness of representation is not a goal of Unicode.

 > > It's also not at all obvious that that is a very
 > > useful requirement when dealing with a character-oriented standard
 > > like Unicode or XML, since you can expect many applications to
 > > canonicalize the text "behind your back".
 > 
 > That's not an issue.

What do you mean by "that's not an issue?"  How can you know when I
haven't named the application?

 > Also you can load, edit and save a text file in colloborative
 > environments, and the diffs/patches will be just in the edited areas
 > (this will supposedly work better with Emacs-23 than Emacs-22).  Those
 > are quite important features.

Sure, and Emacs must provide coding systems that preserve them, and
generally use those coding systems by default.  Did anybody say
otherwise?

 > > Users should get used to it, and we should document how to force Emacs
 > > to error rather than do anything behind your back for those who need
 > > binary faithfulness rather than text faithfulness.
 > 
 > Since binary faithfulness implies text faithfulness, there is no reason
 > not to the right thing instead of erroring out.

"There is no reason"?  How arrogant of you!  Rather, "David Kastrup
lacks the knowledge of the reasons."  Here are three examples:

Binary faithfulness may imply breaking text programs.  For example,
`forward-char' and `replace-string' will give surprising results in a
buffer using Unicode internally that contains Unicode in NFD
normalization (and these anomolies will be noticeable in all Western
European languages excluding English).  Binary faithfulness may imply
inefficiency.  For example, files need not be normalized, which would
imply keeping a copy of the whole file and doing a Unicode diff to
determine which parts of the file need to be saved from the buffer and
which parts from the saved copy.  Binary faithfulness may be
incompatible with other user demands, for example if a user introduces
Latin-2 characters into a Latin-9 text.




  reply	other threads:[~2008-05-19 20:34 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-17 12:31 23.0.60; [nxml] BOM and utf-8 Patrick Drechsler
2008-05-17 14:13 ` Lennart Borgman (gmail)
2008-05-17 16:57   ` Patrick Drechsler
2008-05-17 20:38 ` Mark A. Hershberger
2008-05-21 22:20   ` Patrick Drechsler
2008-05-21 22:37     ` Patrick Drechsler
2008-05-22  1:33       ` Mark A. Hershberger
2008-05-22 14:43         ` Tom Tromey
2008-05-22 21:24           ` Miles Bader
2008-05-22  4:17       ` tomas
2008-05-22  4:33         ` Miles Bader
2008-05-22  8:28           ` Jason Rumney
2008-05-27  8:22           ` tomas
2008-05-22 17:34         ` Stephen J. Turnbull
2008-05-23  9:05           ` tomas
2008-05-23 21:23             ` Stephen J. Turnbull
2008-05-27  8:20               ` tomas
2008-05-18  2:29 ` Stephen J. Turnbull
2008-05-18  2:30   ` Miles Bader
2008-05-18  3:19     ` Eli Zaretskii
2008-05-18  4:19       ` Stephen J. Turnbull
2008-05-18  8:56       ` Jason Rumney
2008-05-18 11:00         ` Patrick Drechsler
2008-05-19  3:11           ` Stephen J. Turnbull
2008-05-19 14:32             ` Patrick Drechsler
2008-05-19 18:56               ` Eli Zaretskii
2008-05-20 15:16                 ` Patrick Drechsler
2008-05-18 15:19         ` joakim
2008-05-18  4:13     ` Stephen J. Turnbull
2008-05-18  5:40       ` Miles Bader
2008-05-18  9:14       ` David Kastrup
2008-05-19  3:05         ` Stephen J. Turnbull
2008-05-18 23:40           ` David Kastrup
2008-05-19 20:34             ` Stephen J. Turnbull [this message]
2008-05-19 20:57               ` David Kastrup
2008-05-19 23:36                 ` Stephen J. Turnbull
2008-05-20  7:13                   ` David Kastrup
2008-05-30  2:47                     ` Kenichi Handa
2008-05-30  3:44                       ` Miles Bader
2008-05-30  3:59                         ` Kenichi Handa
2008-05-19  6:32           ` Lennart Borgman (gmail)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874p8uf2xm.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=dak@gnu.org \
    --cc=emacs-pretest-bug@gnu.org \
    --cc=miles@gnu.org \
    --cc=patrick@pdrechsler.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).