From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: UCS-2BE Date: Fri, 01 Sep 2006 02:32:44 +0300 Organization: JURTA Message-ID: <87ac5ko50j.fsf@jurta.org> References: <878xl5x4lr.fsf@jurta.org> <44F6A74A.9040708@gnu.org> <44F6BC5B.8010504@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1157068060 25241 80.91.229.2 (31 Aug 2006 23:47:40 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 31 Aug 2006 23:47:40 +0000 (UTC) Cc: schwab@suse.de, emacs-devel@gnu.org, jasonr@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 01 01:47:37 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GIwFp-0001zS-7B for ged-emacs-devel@m.gmane.org; Fri, 01 Sep 2006 01:47:25 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GIwFo-0004e9-7M for ged-emacs-devel@m.gmane.org; Thu, 31 Aug 2006 19:47:24 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GIwFA-0004MT-Lj for emacs-devel@gnu.org; Thu, 31 Aug 2006 19:46:44 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GIwF9-0004Lg-9B for emacs-devel@gnu.org; Thu, 31 Aug 2006 19:46:43 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GIwF8-0004LL-VQ for emacs-devel@gnu.org; Thu, 31 Aug 2006 19:46:43 -0400 Original-Received: from [62.64.120.200] (helo=relay01.kiev.sovam.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1GIwOp-0006hG-Fh; Thu, 31 Aug 2006 19:56:43 -0400 Original-Received: from [83.170.232.243] (helo=smtp.svitonline.com) by relay01.kiev.sovam.com with esmtp (Exim 4.60) (envelope-from ) id 1GIwF3-0007Yd-Fp; Fri, 01 Sep 2006 02:46:38 +0300 Original-To: Kenichi Handa In-Reply-To: (Kenichi Handa's message of "Thu, 31 Aug 2006 21:16:37 +0900") User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux) X-Scanner-Signature: 4fe2b897d4d72954524b6b73d18f8617 X-DrWeb-checked: yes X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 232 [August 31 2006] X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0242], KAS30/Release X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:59192 Archived-At: > If UCS-2BE is a mislabel of UTF-16BE, UCS-2BE can simply be > an alias of UTF16-BE. If UCS-2BE is a BMP subset of > UTF-16BE, UCS2-BE should be implemented differently from > UTF-16BE `UCS-2' is the fixed-length encoding of the BMP. `UCS-2BE' is a big-endian version of the UCS-2 encoding without using a BOM. So as actually UCS-2 is a BMP subset of UTF-16, UCS-2BE is a BMP subset of UTF-16BE (and UCS-2LE is a BMP subset of UTF-16LE). The encodings `UCS-2' and `UCS-2BE' are implemented in iconv (http://www.gnu.org/software/libiconv/), so you could look at the implementation of UCS-2BE: http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2be.h?revision=1.4&view=markup Comparing it with the implementation of UTF-16BE, you can see that UTF-16BE deals also with other planes: http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16be.h?revision=1.4&view=markup And comparing UCS-2BE with the implementation of UCS-2, you can see that UCS-2 also deals with a BOM: http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/ucs2.h?revision=1.4&view=markup There is one difference between outputting a BOM in the iconv implementations of UCS-2 and UTF-16: http://libiconv.cvs.sourceforge.net/libiconv/libiconv/lib/utf16.h?revision=1.4&view=markup i.e. converting a string to UTF-16 adds the BOM to the output, but converting to UCS-2 doesn't add the BOM. Does the Emacs implementation of UTF-16 output the BOM? > (at least, we should not select it by select-safe-coding-system on > saving a buffer that contains non-BMP characters). What do you think is the right way to deal with non-BMP characters when the user will try to save a UTF-16(BE) buffer in the UCS-2(BE) encoding? -- Juri Linkov http://www.jurta.org/emacs/