From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: utf-16le vs utf-16-le
Date: Tue, 15 Apr 2008 03:25:51 +0900
Message-ID: <87lk3gfg40.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <E1Jl3bC-0005N2-PJ@fencepost.gnu.org>
	<87wsn1fl72.fsf@uwakimon.sk.tsukuba.ac.jp> <uzlrxqg1o.fsf@gnu.org>
	<87prssgacl.fsf@uwakimon.sk.tsukuba.ac.jp>
	<851w58q24a.fsf@lola.goethe.zz>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1208197395 3883 80.91.229.12 (14 Apr 2008 18:23:15 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 14 Apr 2008 18:23:15 +0000 (UTC)
Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
To: David Kastrup <dak@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Apr 14 20:23:51 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1JlTEi-0000ar-KQ
	for ged-emacs-devel@m.gmane.org; Mon, 14 Apr 2008 20:17:01 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1JlTE3-0001jL-MA
	for ged-emacs-devel@m.gmane.org; Mon, 14 Apr 2008 14:16:19 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1JlTDz-0001fd-9a
	for emacs-devel@gnu.org; Mon, 14 Apr 2008 14:16:15 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1JlTDx-0001ah-Hu
	for emacs-devel@gnu.org; Mon, 14 Apr 2008 14:16:14 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1JlTDx-0001aQ-82
	for emacs-devel@gnu.org; Mon, 14 Apr 2008 14:16:13 -0400
Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stephen@xemacs.org>)
	id 1JlTDh-0002n7-4K; Mon, 14 Apr 2008 14:15:58 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id D73921535AC;
	Tue, 15 Apr 2008 03:15:54 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 95AC81A29F3; Tue, 15 Apr 2008 03:25:51 +0900 (JST)
In-Reply-To: <851w58q24a.fsf@lola.goethe.zz>
X-Mailer: VM 7.19 under 21.5  (beta28) "fuki" 2785829fe37c XEmacs Lucid
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:95196
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/95196>

David Kastrup writes:
 > "Stephen J. Turnbull" <stephen@xemacs.org> writes:

 > > I don't know, in fact I think I think [having BOM-specific coding
 > > systems is] a bad idea.  That's what the part of my message that
 > > you snipped was saying.  But I'll have to defer to Handa-san on
 > > that.
 > 
 > I think it obvious: if a BOM mark gets detected on read, one wants
 > to have it removed from the buffer and reinserted on saving the
 > buffer.

I agree, as you state it, it's obvious.  My question is "why does that
need to be part of the coding system?"  At present the UTF-16 and
UTF-32 Unicode coding systems (in the abstract) have *twenty-seven*
variants each (BOM-required, BOM-prohibited, BOM-autodetected X be,
le, system-dependent X CR, LF, CRLF), and UTF-8 needs *nine*.  This is
nuts, from a user-education standpoint.

What I proposed was a more generic concept where use of signatures and
the EOL convention would (at least to the user) appear as buffer-local
variables.

 > I am just not sure what the semantics for recoding/encoding/decoding
 > regions are.  They should not mess with BOM in any case, I would
 > suppose.  But then reading a file is not equivalent to reading it
 > literally in unibyte mode and then decoding the buffer-region.

That's correct.  The thing is, processing the BOM is a question of
*initialization* of a stream.

 > Maybe there never was such an equivalence (can't be for shift codes, can
 > it?).

In my view, there cannot be an equivalence.  An Emacs buffer in
unibyte mode is a *different* stream from the file it was read from,
and the decision about BOM processing will have to be made differently
from the way the decision is made at the time of reading from the
file.  You could add yet another option for BOM mode, namely "if this
stream is an Emacs buffer that is visting a file in unibyte mode, then
do BOM processing on conversion as if you were reading in the file in
multibyte mode."  I don't much like this....