From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: UCS-2BE Date: Fri, 01 Sep 2006 21:26:59 +0900 Message-ID: References: <878xl5x4lr.fsf@jurta.org> <44F6A74A.9040708@gnu.org> <44F6BC5B.8010504@gnu.org> <87ac5ko50j.fsf@jurta.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1157113734 23962 80.91.229.2 (1 Sep 2006 12:28:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 1 Sep 2006 12:28:54 +0000 (UTC) Cc: juri@jurta.org, schwab@suse.de, jasonr@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 01 14:28:49 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GJ88X-0008CU-BV for ged-emacs-devel@m.gmane.org; Fri, 01 Sep 2006 14:28:41 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GJ88W-00049i-Rl for ged-emacs-devel@m.gmane.org; Fri, 01 Sep 2006 08:28:40 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GJ88I-00049X-3p for emacs-devel@gnu.org; Fri, 01 Sep 2006 08:28:26 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GJ88G-00049E-Qd for emacs-devel@gnu.org; Fri, 01 Sep 2006 08:28:25 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GJ88G-00049B-NL for emacs-devel@gnu.org; Fri, 01 Sep 2006 08:28:24 -0400 Original-Received: from [150.29.246.133] (helo=mx1.aist.go.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GJ8I4-0007My-O8; Fri, 01 Sep 2006 08:38:33 -0400 Original-Received: from smtp3.aist.go.jp ([150.29.246.12]) by mx1.aist.go.jp with ESMTP id k81CSAcg009573; Fri, 1 Sep 2006 21:28:20 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp3.aist.go.jp with ESMTP id k81CS4h4023772; Fri, 1 Sep 2006 21:28:04 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1GJ86t-0007LO-00; Fri, 01 Sep 2006 21:26:59 +0900 Original-To: YAMAMOTO Mitsuharu In-reply-to: (message from YAMAMOTO Mitsuharu on Fri, 01 Sep 2006 20:30:26 +0900) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:59218 Archived-At: Thank you for the info! In article , YAMAMOTO Mitsuharu writes: > "Unicode Technical Report #17, Character Encoding Model" > (http://www.unicode.org/reports/tr17/index.html) says: [...] > Examples of Unicode Character Encoding Schemes: [...] > Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE, > and UCS-2LE, although the latter two were not named that way at > the time. Ah! So here we can see the term "UCS-2BE" as CES. But how it was defined? (I don't have Unicode 1.1) > I suspect "UCS-2BE" is just a customary name and not explicitly > defined even in ISO/IEC 10646. > "UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html) > says: > No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16, > and UTF-32, though ISO 10646-1 says that Bigendian should be > preferred unless otherwise agreed. It has become customary to > append the letters "BE" (Bigendian, high-byte first) and "LE" > (Littleendian, low-byte first) to the encoding names in order to > explicitly specify a byte order. I don't know how much authorized this page is, but it also says: A full featured character encoding converter will have to provide the following 13 encoding variants of Unicode and UCS: UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4LE, UCS-4BE, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE It seems that UCS-2BE is not a mis-label of UTF-16BE, then, it seems that treating it as a subset (not using surrogate pair) of UTF-16BE (as done in iconv) is the right thing. I'll try to implement it (and others) in emacs-unicode-2. By the way, why do people want such many variants... sigh... --- Kenichi Handa handa@m17n.org