From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: YAMAMOTO Mitsuharu Newsgroups: gmane.emacs.devel Subject: Re: UCS-2BE Date: Fri, 01 Sep 2006 20:30:26 +0900 Organization: Faculty of Science, Chiba University Message-ID: References: <878xl5x4lr.fsf@jurta.org> <44F6A74A.9040708@gnu.org> <44F6BC5B.8010504@gnu.org> <87ac5ko50j.fsf@jurta.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1157110261 11355 80.91.229.2 (1 Sep 2006 11:31:01 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 1 Sep 2006 11:31:01 +0000 (UTC) Cc: Juri Linkov , schwab@suse.de, emacs-devel@gnu.org, jasonr@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 01 13:30:53 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GJ7EX-0006NN-Fr for ged-emacs-devel@m.gmane.org; Fri, 01 Sep 2006 13:30:49 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GJ7EW-0006nT-UB for ged-emacs-devel@m.gmane.org; Fri, 01 Sep 2006 07:30:48 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GJ7EL-0006mk-MA for emacs-devel@gnu.org; Fri, 01 Sep 2006 07:30:37 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GJ7EJ-0006lC-RY for emacs-devel@gnu.org; Fri, 01 Sep 2006 07:30:37 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GJ7EJ-0006l9-NT for emacs-devel@gnu.org; Fri, 01 Sep 2006 07:30:35 -0400 Original-Received: from [133.82.132.2] (helo=mathmail.math.s.chiba-u.ac.jp) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GJ7O5-00016M-Kt; Fri, 01 Sep 2006 07:40:42 -0400 Original-Received: from church.math.s.chiba-u.ac.jp (church [133.82.132.36]) by mathmail.math.s.chiba-u.ac.jp (Postfix) with ESMTP id D7A262CBA; Fri, 1 Sep 2006 20:30:26 +0900 (JST) Original-To: Kenichi Handa In-Reply-To: User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/22.0.50 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:59217 Archived-At: >>>>> On Fri, 01 Sep 2006 10:19:34 +0900, Kenichi Handa said: > UCS-XXX are CEF, and UTF-XXX are CES. So, UCS-XXX are not > appropriate lavel names for specifing how to byte-serialize > characters (i.e. on saving characters in a file). At least, that is > the official definition in Unicode. IIUC, UCS is in the ISO/IEC 10646 terminology, rather than in the Unicode terminology except Unicode 1.1 (though there would be some references in the documentations, of course.) "Unicode Technical Report #17, Character Encoding Model" (http://www.unicode.org/reports/tr17/index.html) says: Examples of encoding forms as applied to particular coded character sets: Name Encoding forms Unicode 4.0 UTF-16 (default), UTF-8, or UTF-32 encoding form Unicode 3.0 either UTF-16 (default) or UTF-8 encoding form Unicode 1.1 either UCS-2 (default) or UTF-8 encoding form ISO/IEC 10646, depending on the declared implementation levels, may have UCS-2, UCS-4, UTF-16, or UTF-8. Examples of Unicode Character Encoding Schemes: The Unicode Standard has seven character encoding schemes: UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE. Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE, and UCS-2LE, although the latter two were not named that way at the time. I suspect "UCS-2BE" is just a customary name and not explicitly defined even in ISO/IEC 10646. "UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html) says: No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16, and UTF-32, though ISO 10646-1 says that Bigendian should be preferred unless otherwise agreed. It has become customary to append the letters "BE" (Bigendian, high-byte first) and "LE" (Littleendian, low-byte first) to the encoding names in order to explicitly specify a byte order. YAMAMOTO Mitsuharu mituharu@math.s.chiba-u.ac.jp