From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: utf-16le vs utf-16-le Date: Mon, 14 Apr 2008 16:20:16 -0400 Message-ID: References: <87wsn1fl72.fsf@uwakimon.sk.tsukuba.ac.jp> <87prssgacl.fsf@uwakimon.sk.tsukuba.ac.jp> <851w58q24a.fsf@lola.goethe.zz> <87lk3gfg40.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1208204444 1475 80.91.229.12 (14 Apr 2008 20:20:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Apr 2008 20:20:44 +0000 (UTC) Cc: Eli Zaretskii , emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Apr 14 22:21:18 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JlVAx-0004sN-0W for ged-emacs-devel@m.gmane.org; Mon, 14 Apr 2008 22:21:15 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JlVAI-0003zm-JC for ged-emacs-devel@m.gmane.org; Mon, 14 Apr 2008 16:20:34 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JlVAE-0003z5-8G for emacs-devel@gnu.org; Mon, 14 Apr 2008 16:20:30 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JlVAB-0003yl-RE for emacs-devel@gnu.org; Mon, 14 Apr 2008 16:20:28 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JlVAB-0003yi-J7 for emacs-devel@gnu.org; Mon, 14 Apr 2008 16:20:27 -0400 Original-Received: from ironport2-out.pppoe.ca ([206.248.154.182] helo=ironport2-out.teksavvy.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JlVA5-0000HO-1M; Mon, 14 Apr 2008 16:20:21 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Al0FAItZA0hMCqsI/2dsb2JhbACBXqlb X-IronPort-AV: E=Sophos;i="4.25,656,1199682000"; d="scan'208";a="18494207" Original-Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 14 Apr 2008 16:20:16 -0400 Original-Received: from pastel.home ([76.10.171.8]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id UYI63716; Mon, 14 Apr 2008 16:20:16 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 5D5987F3C; Mon, 14 Apr 2008 16:20:16 -0400 (EDT) In-Reply-To: <87lk3gfg40.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Tue, 15 Apr 2008 03:25:51 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:95209 Archived-At: >> > I don't know, in fact I think I think [having BOM-specific coding >> > systems is] a bad idea. That's what the part of my message that >> > you snipped was saying. But I'll have to defer to Handa-san on >> > that. >> >> I think it obvious: if a BOM mark gets detected on read, one wants >> to have it removed from the buffer and reinserted on saving the >> buffer. > I agree, as you state it, it's obvious. My question is "why does that > need to be part of the coding system?" At present the UTF-16 and > UTF-32 Unicode coding systems (in the abstract) have *twenty-seven* > variants each (BOM-required, BOM-prohibited, BOM-autodetected X be, > le, system-dependent X CR, LF, CRLF), and UTF-8 needs *nine*. This is > nuts, from a user-education standpoint. For what it's worth, I do think it would make sense to try and move the BOM-processing outside of the coding-system proper. For me a good test for coding-system-worthiness is "what if I use it for a process rather than a file". Based on this test, I'm not sure if BOMs really fit in (other than for auto-detection and automatically stripping them, maybe). > What I proposed was a more generic concept where use of signatures and > the EOL convention would (at least to the user) appear as buffer-local > variables. Here, I disagree: EOL processing definitely need to take place when talking to subprocesses, so EOL-handling doesn't belong in buffer-local vars but in the coding-system. Stefan