From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: utf-16le vs utf-16-le Date: Mon, 14 Apr 2008 07:23:45 +0900 Message-ID: <87wsn1fl72.fsf@uwakimon.sk.tsukuba.ac.jp> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1208124848 9340 80.91.229.12 (13 Apr 2008 22:14:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 13 Apr 2008 22:14:08 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Apr 14 00:14:44 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JlATE-0006K0-6E for ged-emacs-devel@m.gmane.org; Mon, 14 Apr 2008 00:14:44 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JlASZ-0003Iu-Ud for ged-emacs-devel@m.gmane.org; Sun, 13 Apr 2008 18:14:03 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JlASW-0003Im-KX for emacs-devel@gnu.org; Sun, 13 Apr 2008 18:14:00 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JlASV-0003IU-7P for emacs-devel@gnu.org; Sun, 13 Apr 2008 18:14:00 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JlASV-0003IR-1p for emacs-devel@gnu.org; Sun, 13 Apr 2008 18:13:59 -0400 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JlASR-0007T1-Ah; Sun, 13 Apr 2008 18:13:55 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id 3188F1535AC; Mon, 14 Apr 2008 07:13:51 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 667881A29F3; Mon, 14 Apr 2008 07:23:45 +0900 (JST) In-Reply-To: X-Mailer: VM 7.19 under 21.5 (beta28) "fuki" 2785829fe37c XEmacs Lucid X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:95137 Archived-At: Eli Zaretskii writes: > These two encodings have confusingly similar names, but significantly > different semantics: one expects a BOM, the other does not. > I tripped over these when I tried to read debugging logs saved by > MS-Windows, which are in UTF-16 without a BOM: I used utf-16-le, which > swallowed the first character. When I realized it was due to a BOM, > it took me reading of the doc strings of each encoding to find out > what I did wrong. Are you saying it was eating non-BOM characters? But that's clearly a bug in the codec. If it's going to expect a BOM, it should error if it doesn't get one, not eat the character. This business of having presence or absence of signatures determined by coding systems has always felt wrong to me. Signatures are generally related to higher-level protocols (eg, XML mandates them for UTF-16, while the MS logging facility de facto prohibits them). So whether a signature is used or not should be a buffer-local variable, not a property of the coding system.